Economic Performance of Modularized Hot-Aisle Contained Datacenter PODs Utilizing Horizontal Airflow Cooling by Albert 0. Rabassa HI B.S. Computer Science, Monmouth University, 1978 M.S. Information Systems and Technology, Johns Hopkins University, 2008 Submitted to the MIT Sloan School of Management in Partial Fulfillment for the Degree of Master of Science in Management of Technology at the Massachusetts Institute of Technology INS .F TECHNOLOGY MASSACHUSETTS JUN 18 201 UBRARIES June 2014 C 2014 Albert 0. Rabassa III All Rights Reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly thesis and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature redacted Signature of Author: MIT Sloan School of Management May 1, 2014 Certified by: Signature redacted Johh E. Van Maanen, PhD, Thesis Advisor Professor of Organization Studies, MIT Sloan School of Management Signature redacted Reviewed by:. Christopher N. Hill, PhD, Technical Review Principal Engineer, Department of Earth, Atmospheric and Planetary Sciences Accepted by: Signature redacted L", IStephen J. Sacca, Director MIT Sloan Fellows Program in Innovation and Global Leadership Page 1 fE This PageIntentionally Left Blank Page 2 Economic Performance of Modularized Hot-Aisle Contained Data center PODs Utilizing Horizontal Airflow Cooling By Albert 0. Rabassa III Submitted to the MIT Sloan School of Management on May 1, 2014 in partial fulfillment of the requirements for the degree of Master of Science in Management of Technology Abstract Evolutionary and revolutionary advances in computational and storage systems have driven electronic circuit densities to unprecedented levels. These high-density systems must be adequately cooled for proper operation and long life expectancy. Cooling solutions must be designed and operated to minimize energy and environmental impacts. Executive decisions are deeply rooted in the technical aspects of the systems and solutions sought. These interdependent solutions seek to maximize system performance while minimizing capital and operating expenditures over the economic life of the data center. Traditional data centers employ a raised floor plenum structure to deliver cooling via perforated floor tiles as the primary delivery system for component cooling. Heated exhaust air exits the equipment and travels upward to warm return plenum structures for subsequent capture and re-cooling. This floor-to-ceiling airflow behavior represents a vertical airflow-cooling paradigm. The resulting airflow may travel 150 feet or more per cooling cycle. A new class of data center cooling utilizes a technique called 'in-row' cooling. This new technique does not require a raised floor plenum, perforated tiles, nor return plenum structures. The airflow travels horizontally from rack-to-rack with respect to cold air delivery and warm air return. Airflow travel is subsequently reduced to only 10 feet per cooling cycle. This thesis will explore the economic benefits and economies of this new airflow paradigm against traditional data centers through the use of measurement and Computational Fluid Dynamic (CFD) modeling software. Thesis Advisor: John E. Van Maanen, PhD. Title: Professor of Organization Studies, MIT Sloan School of Management Thesis Advisor: Christopher N. Hill, PhD. Title: Principal Engineer, Department of Earth, Atmospheric and Planetary Sciences Page 3 This Page IntentionallyLeft Blank Page 4 Acknowledgements John Van Maanen PhD, Sloan School of ManagementI wish to thank Dr. Van Maanen for his yearlong executive leadership and training. I hope to carry the lessons from self-reflection and leading organizations back to my parent company. It has been a fantastic year. Onward! Christopher N. Hill, Department of Earth, Atmospheric, and Planetary SciencesI wish to thank Dr. Hill for kick starting my thesis project and keeping me technically engaged at the MGHPCC. Without his help, access to the data center would have been extremely difficult and time consuming. Thank you. MGHPCC - Massachusetts Green High Performance Computing CenterI wish to thank the Massachusetts Green High Performance Computing Center located in Holyoke, MA and its Executive Director Mr. John Goodhue. The MGHPCC provided a state of the art data center facility allowing me to perform my academic research. Kevin Helm, Director of Facilities Management; Mark Izzo, Chief Engineer; and Scott Alix, Facilities Engineer, provided complete access and assisted me with physical, mechanical, and electrical measurements that were critical to my research. Innovative Research Inc.I wish to thank Dr. Kailash C. Karki and the executive staff of Innovative Research' in Plymouth MN who provided an academic license for the TileFlow CFD modeling software. CFD airflow modeling was a cornerstone of my research thesis. Without this generous corporate offer, this thesis would not have been possible. My SponsorI wish to thank Director David Petraeus for establishing the Executive Scholarship Program which brought me to the MIT Sloan School of Management and to the members of the Center for Leadership Division for their support in the year-long execution of this program. Special thanks to Richard C. and Elaine S. who wrote compelling letters of recommendation to the scholarship selection committee, knowing that their letters could ultimately remove me from their workforce for over a year. The 8-member vote was unanimous. My Wife and FamilyI wish to thank my loving wife of 33 years, Cecilia, for taking the helm while I was so far away. It is the most unselfish and loving act that I could ever imagine. I could have never done this without you. We have beaten the odds. And to my two sons AJ and Matthew- thank you for stepping in when I was away. You are never too old to continue your education or to try something new. Good talk. Innovative Research Inc., www.inres.com Page 5 This Page IntentionallyLeft Blank Page 6 Disclaimer The opinions and views expressed throughout this thesis are the opinions of the author and may not reflect the opinions or views of: my sponsor, the Massachusetts Institute of Technology (MIT), the Massachusetts Green High Performance Computing Center (MGHPCC), nor their respective: faculty, staff, advisors, contractors, or other parties. Any and all products, graphics, photos, and formulae mentioned in this thesis are for reference and clarity proposes only. It does not constitute an endorsement, use, preference, advertisement, or suitability to task. This thesis does not constitute direction or consulting advice. All products, product names, logos, or trademarks mentioned in this thesis remain the sole ownership of their respective owners. All sources are cited. Symbols will denote such ownership (@, ©, TM). Page 7 This Page IntentionallyLeft Blank Page 8 Table of Contents Abstract ....................................................................................................................................... 3 Disclaim er ................................................................................................................................... 7 Table of Contents ....................................................................................................................... 9 Table of Figures ........................................................................................................................ I I Table of Tables ......................................................................................................................... 12 Purpose ..................................................................................................................................... 13 Chapter I - Introduction ........................................................................................................... 14 Scope .................................................................................................................................... 18 Hypothesis ............................................................................................................................ 19 Assum ptions ......................................................................................................................... 19 M ethodology ......................................................................................................................... 19 Research Facility .................................................................................................................. 21 Test Environm ent ................................................................................................................. 22 Chapter 2 - Tools ...................................................................................................................... 23 Description and Use ............................................................................................................. 23 TileFlow ............................................................................................................................... 23 Load Banks ........................................................................................................................... 23 Infrared Therm al Imaging Cam era ....................................................................................... 24 Computing System ............................................................................................................... 25 Computational Fluid Dynam ic (CFD) M odeling ................................................................. 26 Lim its of M odeling ............................................................................................................... 27 Test Scenarios ....................................................................................................................... 27 Test Setup ............................................................................................................................. 28 CFD M odeling Input ............................................................................................................ 29 CFD M odeling Output .......................................................................................................... 30 Test Results .......................................................................................................................... 31 Confidence of CFD M odeling .............................................................................................. 33 Chapter 3 - An Overview of Traditional Data centers ............................................................. 34 Construction ......................................................................................................................... 35 PUE (Power Usage Effetiveness) ......................................................................................... 36 Competing Cooling Solutions .............................................................................................. 37 Downflow CRAH Unit ......................................................................................................... 37 In-Row Cooling Unit ............................................................................................................ 39 CFD M odeling Traditional Data centers .............................................................................. 42 Sub-M odels vs. Full-m odels ................................................................................................. 42 Sub-m odel - Traditional Data center ................................................................................... 43 Full-m odel - Traditional Data center ................................................................................... 45 Sub-M odel with Containm ent PODs .................................................................................... 46 Full-m odel with Containm ent PODs .................................................................................... 47 Conclusion - Full-model ...................................................................................................... 52 Raised Floor Cost Considerations ........................................................................................ 52 Raised Floor Costs ................................................................................................................ 54 Raised Floor O& M Costs ..................................................................................................... 55 Drop Ceiling Plenum Structures ........................................................................................... 55 Page 9 Drop Ceiling O &M Costs..................................................................................................... Chapter 4 - CFD Modeling In-Row Cooling with Containment PODs................ 56 56 CFD M odeling - In-Row Cooling w ith POD s.................................................................. 57 CFD M odeling - Full-m odel POD .................................................................................... 61 Modeling Results - Full-model, In-Row cooling with PODs .................... Staggered In-Row Cooling - W hat If.................................................................................. Results - Staggered In-Row Cooling ................................................................................... Chapter 5 - Observations - CFD M odeling ........................................................................... Chapter 6 - Econom ic Sum m ary .......................................................................................... Iterative Econom ic D esign ............................................................................................... Rack Pow er D ensity ............................................................................................................. 63 63 64 65 68 68 69 N umber of IT racks .............................................................................................................. 70 Iterative Savings ................................................................................................................... 70 PU E Hacking ........................................................................................................................ 71 Chapter 7 - Technical Sum mary .......................................................................................... Observations ......................................................................................................................... Chapter 8 - Business Case Summ ary .................................................................................... Chapter 9 - Conclusions........................................................................................................... Glossary .................................................................................................................................... 75 77 82 83 85 Page 10 Table of Figures Figure 1: Moore's Law Performance Graph 1971-Present .................................................. Figure 2: Effects of Moore's Law - Short Term (5 years). ................................................... Figure 3: PU E Exam ple 1.8 vs. 1.2. ..................................................................................... Figure 4: Exam ple of False M aximum ............................................................................... Figure 5: The Massachusetts High Performance Computing Center. ................................. Figure 6: Photo(l) and CFD Model(r) of an 8-Rack POD (Sandbox Area). ....................... Figure 7: Physical(l) and CFD Model(r) of a 24-Rack POD (main data center). ............... Figure 8: Simplex MicroStar® 17.5kw Load Bank, 19" Rack-Mounted............................. Figure 9: Sample IR Thermal Image of Computer Racks/Rows......................................... Figure 10: Sample Tileflow® CFD Image File (JPG). ....................................................... Figure 11: CFD M odeling Output ........................................................................................ Figure 12: Actual to Predicted Thermal Results (A,B,C). .................................................. Figure 13: Various CFD Views and Billboards from the Sandbox Modeling. ........... Figure 14: Example of a Data center Raised floor. .............................................................. Figure 15: Down-Flow CRAH unit with Fan Module extended......................................... Figure 16: CFD model of downflow CRAHs with extended fan units. .............................. Figure 17: Downflow CRAH schematic of major components. .......................................... Figure 18: In-R ow cooling unit ........................................................................................... Figure 19: In-Row Power (Watts) and Airflow (CFM) values of capacity......................... Figure 20: Traditional legacy data center, 20-year old approach......................................... Figure 21: Traditional legacy data center sub-model showing overheated racks/equipment. Figure 22: A 30,100 sq ft Traditional Data center w/Raised Floor.................. Figure 23: Traditional Data center with Containment POD................................................ Figure 24: A 30,100 sq ft Traditional Data center with Hot Aisle Containment. ............... Figure 25: Close-up view of multiple containment PODs. ................................................ Figure 26: Full object detail showing the ceiling and ceiling vent above the hot-aisles......... Figure 27: Horizontal Billboard of a full-model of the data center (12kw load). ............... Figure 28: Vertical Billboard of the data center (12kw load). ............................................ Figure 29: Vertical Billboard with equipment racks set to "invisible"................................ Figure 30: Various Vertical Billboard sweeps across the data center (w/consistent results).. Figure 31: Subfloor plenum pressure and directivity.............................................................. Figure 32: Raised Floor loadings. ....................................................................................... Figure 33: MGHPCC on-slab installation of PODs. ............................................................ Figure 34: Raised floor cost estim ate. ................................................................................ Figure 35: Basic POD design utilizing in-row cooling. ..................................................... Figure 36: PODs with in-row cooling horizontal billboard.................................................. Figure 37: POD with in-row cooling horizontal billboard. ................................................. Figure 38: Vertical billboard, in-row cooling system highlight........................................... Figure 39: In-row cooling with vertical billboards............................................................. Figure 40: Full data center model, POD design with in-row cooling. ................................ Figure 41: CFD data center model, horizontal billboard.................................................... Figure 42: Full-m odel, horizontal billboard ........................................................................ Figure 43: Thermal Performance of Staggered In-Row Cooling. ....................................... 15 17 18 20 21 22 22 24 25 26 30 32 33 36 38 38 39 40 41 43 44 45 46 47 48 49 49 50 51 51 51 53 54 55 58 59 59 60 60 61 62 62 64 Page 11 Figure Figure Figure Figure Figure Figure 44: 45: 46: 47: 48: 49: Stagger design, horizontal billboard. ................................................................ Floor Tile airflow vs distance to CRAH unit...................................................... Server Fan Power vs. Inlet Temperature (circa 2004)........................................ Server Electrical Power vs. Inlet Temperature (2014)........................................ In-Row Cooling Capacity and Costs (Unconstrained > 10MW )......... In-Row Cooling Capacity and Costs (Constrained to 10MW). ......................... 64 67 72 73 80 80 Figure 50: Data center container exhibiting 1,000 watts sq/ft Power Densities. ................ Figure 51: Active floor tiles to regulate airflow................................................................. 82 83 Table of Tables Table 1: Fan A ffinity Law s.......................................................................... 41 Table 2: PUE Influenced by IT Fan Power.......................................................74 Table 3: Capacity M atrix............................................................................. 81 Page 12 Purpose Prior to my yearlong academic assignment to the MIT Sloan School of Management, I served as a Data Center Engineer for the US Federal Government. I am responsible for the designs and technologies affecting our data centers. Collectively, this represents over 30MW of electrical power with capital expenditures exceeding $500M. With shrinking fiscal budgets, Executive Orders, and serving as a steward of taxpayer money, I continually search for cost efficiencies within my area of responsibility. I have operated on the cutting edge of technology of data centers for the past eight years. My experiences span the following types of data centers: * Purposefully built data centers- independent power and cooling systems. " Data centers within office spaces- shared power and cooling systems. * Data center containers attached to buildings- shared power and cooling systems. * Stand-alone data center containers- independent power and cooling systems. While observing the computational infrastructure at MIT, I noted a newly constructed data center 90 miles west of the MIT campus in Holyoke MA. It was advertised to be "green" and highly energy efficient. I was granted a data center tour and found this new data center to contain the latest in energy efficient technologies and methodologies. This new cooling technology piqued my interest as this data center is using an "in-row" cooling system vs. a traditional raised floor cooling system. This appeared to be an excellent area to perform research relating to the viability and applicability to my data center environments (both present and future). I was subsequently granted access to the data center to perform Page 13 engineering research and to derive underlying business cases supporting such energy efficient designs. The purpose of this thesis is to compare and contrast the economic and cost benefits of this new type of in-row cooling technology. Computational Fluid Dynamic (CFD) airflow models will be used to compare an equivalent offering using complementary airflow dynamics found in many traditional data centers. This thesis is not an exercise in thermodynamics. However, it does represent the highly technical aspects of data center planning, the executive decision process, and the understanding of the highly technical aspects and the decisions affecting data center up-front capital expenses (CAPEX), operational expenditures (OPEX), and design considerations. Chapter 1 - Introduction Over the past 15 years, data center managers have struggled with the rate of change of the computational equipment and the infrastructure that is housed within the data center's area of responsibility. The rate of change has been incremental on a yearly basis, but exhibits a compounding effect on the data center itself. Resources and excess capacities that may have existed are consumed at a rate faster than ever imagined. With data center upgrades and new construction requiring 1-year of planning and 24-3 6 month of execution, data center managers become completely overwhelmed and are unable to expand or adapt to new and improved technologies that may actually have helped their situation. Page 14 These capacity problems and rates of growth are not slowing. In fact, they are accelerating. Moore's Law states that circuit densities will increase by a factor of two (2x) every 18 months. This held true for many years, but has flattened to a rate 2x every 24 months 2 . The net effect is that circuit densities and the heat generated by these circuits will continue to rise. Data centers must be ready to incorporate and adapt to the latest equipment offerings. While it is impossible to "future proof' the data center, reasonable steps must be taken to plan and to project capacity, consumption, and the organizational responses to those demands. 2 800 UC'2 000 aooooooo 0 Cu"v shows J-count ranSiSt(V dmAui nerey tm ADK yewrs 0 00)000 C 002 0001990 Figure 200A 2012 1: Moore's Law Performance Graph 197 1-Present The original Moore's Law as stated by IBM's Gordon Moore in 1965 indicated a circuitdensity doubling effect every 18 months. This was appropriate with Integrated Circuits (ICs), but has slowed recently due to production cycles- primarily as a result of Intel Corporation (NASDAQ: INTC). 'tick' and 'tock'. Since 2007, Intel Corporation uses a two-year production cycle called A 'tick' cycle shrinks the die, say- from 45nm to 32nm per transistor 3http://www.asml.com/imglib/sustainability/2012/report-visuals/ASML-sust I2_wide-mooreslaw.png Page 15 junction. A 'tock' cycle updates the microarchitecture. Production complexities warrant this behavior, thus slowing the original description of Moore's Law as a matter of practicality and profit. It should be noted that transistor junction sizes have shrunk from 65nm, to 45nm, to 32nm, to 22nm, with current production operations at l4nm junction sizes. This progression cannot continue without limit. Projected junction sizes of: 1 Onm, 7nm, and finally 5nm by the year 2022. At such time, the junction sizes approach single molecules- reaching the molecular limit of miniaturization. After 60 years, Moore's Law may have reached its limit. Figure-1 depicts Moore's Law over a 40+ year period. While it is readily accepted as the de facto industry trend of the computer chip manufacturing market, it bears little resemblance or meaning to data center managers with short business horizons. It is not until managers examine a five year graph of Moore's Law that the impacts of exponential growth become evident in the near term. Figure-2 depicts four growth models: linear growth, doubling every 18 months, doubling every 24 months, and doubling every 36 months. The intent is to demonstrate the time horizon of a fixed resource (the data center's power and cooling infrastructure) against three non-linear growth models for the computational resources deployed within the data center. Page 16 Voci Xctri Linear inprovomenatn Law. ( t.IP urniS) La. (w 1-36 '.uia IC 1- C . 2 Years Figure 2: Effects of Moore's Law - Short Term (5 years)4 . Data center managers are now faced with multi-dimensional problems- increasing circuit densities, variable equipment cooling, users doing more and more with the technological improvements in data processing hardware/software, 24x7 access, Big Data, shrinking budgets, different ages of the computing equipment, and competition from outside resources to move into the "cloud". The set of possible solutions available represent huge swings in corporate thinking, funding, and planning horizon. The solutions may include: retrofitting the existing data center spaces to accommodate the new computing equipment, building a new data center facility specially built for this purpose, or out-sourcing computing needs to a third party as part of a cloud computing initiative. Each solution has different levels and types of risk, funding profiles, and time horizons. For the purposes of this thesis, I will target single-site data centers with a total site capacity of approximately 10 megawatts (MW) of electrical power. This thesis will also assume a state of 4 http://sfecdn.s3.amazonaws.com/newsimages/ThreePredictions/MooresLaw2.jpg Page 17 the art energy efficient design exhibiting a Power Usage Effectiveness 5 (PUE) of 1.5:1 or less. This equates to 66% of the total site power available for the computation needs of the data center. The remaining 34% of the electrical power is available for cooling, lighting, pumps, HVAC, hot water, and distributed electrical and thermal losses. See Figure-3. IT Load = 100kW IT Load = 100kW Infrastructure Load = 80kW infrastructure Load = 20kW Total Load PUEU= 100 18 180kW QE=1O 10 . PUE N ITLoad Coolng 4% 39% . Total Load = 120kW PUE = 1.2 1.8 I U IT Load UPS losses Lighting & Ancidi.ries * cooling 1% 1% M 100 120 6 56% UPS losses Lghtig &Ancillares 2% 83% Figure 3: PUE Example 1.8 vs. 1.26. Scope The scope of this thesis will be limited to the MGHPCC facility and the CFD computer models derived from raw measurement and comparative analysis. The focus will remain on the economic performance factors of a new type of cooling topology used in high performance data centers. Economic performance factors ride atop several complex and Power Usage Effectiveness (PUE) is a metric created by The Green Grid. It is an arithmetic fraction expressing the total power consumed by the computing equipment divided by the total power consumed by the entire site. It is a measure of efficiency. PUE may be expressed as a fraction or as a ratio. They are simply arithmetic reciprocals of the same expression. As an example: a data center having a PUE of 1.8:1 will have 56% of the site's power reaching the computing equipment. A data center having a PUE of 1.2:1 will have 83% of the total power reaching the computing equipment. Lowering the PUE value is a means of improving the site's ability to serve the computational needs of the data center- its primary purpose. Lowering PUE values may require large up-front CAPEX expenses and is therefore one of the first technical and cost decisions made. 5 6 www.futre-tech.co.uk/wp-content/uploads/2013/1 1/pue-1-7-and-1-8.png Page 18 interconnected systems. These systems must be fully understood and measured to expose their maximum economic potential. This will require an array of specialized tools, measurement techniques, analysis, and computer modeling. Hypothesis "In-row cooling utilizing hot-aisle containment systems exhibit a lower initial CAPEX investment and lower recurring OPEX expenses over the economic life of a data center (approximately 20 years)." Assumptions * The data center is an enterprise-class data center exhibiting many of the Tier-III Concurrently Maintainabledesign features as specified by The Uptime Institute. 0 The data center design supports a PUE of 1.5:1 or lower. * 10 MW total facility power, utility provided. 0 $0.10 per kwh of cost for electricity as a national average. * The economic life of the data center is 20 years before recapitalization. * The Power and Cooling infrastructures are dedicated to the data center. Methodology My approach combines both Qualitative and Quantitative measures. measures come from actual measurements and computer modeling. The quantitative Qualitative measures come from the examination of the data and making changes to the model reflecting my personal experiences spanning 8+ years of airflow modeling. The analysis and qualitative Page 19 changes to the input of airflow models will result in quantitative changes in the output. This will be an iterative process. As with any iterative maximization, there exists a potential for a false maximum. See Figure-4. If a system maximum is found during any iterative process (Point-A), there is the potential that a higher/better maximum iteration point may exist (PointB). A Figure 4: Example of False Maximum Point-A maxima is unaware of the Point-B maxima due the lack of exhaustive testing of every possible combination. Such is the case in this thesis. When examining the thermal performance of single PODs, regardless of cooling and airflow systems, there are differences between the in-row cooling and traditional data center using a raised floor. The traditional 7 data center with a raised floor and perimeter-located cooling units (CRAH ) are significantly more complex to model due to the number of interactions between the system. Therefore, modeling a single POD unit must be re-validated against a data center room full of PODs to ensure that a false maximum has not been reached. 7 CRAH vs. CRAC - A Computer Room Air Handler (CRAH) is different from a Computer Room Air Conditioner (CRAC), although their cooling functions are the same. A CRAC (air conditioner) contains a direct expanding mechanical refrigeration (gas-liquid phase change) unit to provide cooling. A CRAH (air handler) is a cooling unit that works with a chilled water (CW) loop system. Page 20 Research Facility The research for this thesis was performed at the Massachusetts Green High Performance Computing Center (MGHPCC) located in Holyoke MA. See Figure-5. This newly constructed facility is a multi-university computing center dedicated to high performance computing (HPC) with the "greenest" footprint possible. In its present configuration, it is capable of supporting a 10 Megawatt total site load. The Massachusetts Green High Performance Computing Center (MGHPCC) is a data center dedicated to research computing. It is operated by five of the most research-intensive universities in Massachusetts: Boston University, Harvard University, MIT, Northeastern University, and the University of Massachusetts. It serves the growing research computing needs of the five founding universities as well as other research institutions. Figure 5: The Massachusetts High Performance Computing Center9 . 8 Massachusetts Green High Performance Computing Center (MGHPCC) www.mghpcc.org 9 Ibid., Photograph Page 21 Test Environment There are different areas within the Massachusetts Green High Performance Computing Center (MGHPCC) in which to perform testing. The smallest and most flexible area is called the "Sandbox". See Figure-6. This is a 640 sq. ft. area data center is designed specifically for testing without affecting live data center operations. The power and cooling system are the identical, but greatly reduced in size. The Sandbox has two In-Row Coolers (IRCs) and eight (8) equipment racks. The Sandbox has the same enclosure hardware which is used to create a hot-aisle containment system used in the actual data center. Figure-7. Figure 6: Photo(l) and CFD Model(r) of an 8-Rack POD (Sandbox Area). A Single MGHPCC 24-Rack POD. Thermal CFD Model Equivalent. Figure 7: Physical(1) and CFD Model(r) of a 24-Rack POD (main data center). Page 22 Chapter 2 - Tools Description and Use The following data center tools will be used to purposefully perturb a system in balance in order to measure the response and recovery to these perturbations. To the extent possible, interpolation and extrapolations will be made in an attempt to predict system behavior outside of the ability (and need) to stress the system to the point of failure/alarm. TileFlow The software product TileFlow* is an application written by Innovative Research Inc. This PC-Windows based application utilizes computation fluid dynamic algorithms to graphically depict the highly interactive airflow dynamics of a complex data center. Individual PODs and the entire data center will be modeled using this CFD product. TileFlow@ is a powerful three-dimensional software tool for simulating cooling performance of data centers. It uses the state-of-the-art computational fluid dynamics (CFD) techniques and is applicable to both raised-floor and nonraised-floor data centers 10 Load Banks A load bank is a high power electro-mechanical device that converts electricity directly to heat via multiple resistive heating elements and a fan. Its purpose is to simulate an energy demand in an attempt to: 1) stress an electrical thru circuit loading, or 2) stress a cooling system by 10 Innovative Research Corp., www.inres.com Page 23 injecting heat to be thermally removed. The resolution of the power/heat may be adjusted in 1,250 watt increments for fine grained experiments. A single load bank (Figure-8) will be used to simulate approximately 17,500 watts of heat load- the equivalent of 60,000 BTUs of heat (~5 Tons of cooling to reject the heat). The hot air exhaust of the load banks may be adjusted to further test the airflow dynamics of a hot aisle containment system and the subsequent cooling systems. Figure 8: Simplex MicroStar@ 17.5kw Load Bank, 19" Rack-Mounted. Infrared Thermal Imaging Camera A state of the art infrared thermal imaging camera will be used to visually measure the effects of heating and cooling. A FLiR thermal imaging camera will serve as this measurement tool. Since air cannot be directly photographed, the secondary effects of heated air on surrounding surfaces can be easily photographed within the thermal domain. The resolution of the IR camera is only 240x240 pixels (Figure-9). Each thermal image will require time to review each image for content along with an accompanying description to assist the viewer. Each thermal experiment will be run for 1-hour to allow temperatures to stabilize, thus removing the effects of thermal mass. Measurements and images are taken and recorded at the end of the 1-hour experiment. Page 24 Figure 9: Sample IR Thermal Image of Computer Racks/Rows." Computing System All of the CFD modeling using the TileFlow product in this research thesis have been performed on a HP Pavilion* dv6 laptop. The processor is Intel based i7-3610QM CPU* quad core processor running at 2.3GHz. Installed memory is 8GB. Windows-7 Operating System* with all patches and updates as of 3/1/2014. NVIDIA* chipsets are used for graphic acceleration and display. Approximate run-times for the various CFD models: MGHPCC SandBox models 20 Rack Single PODs 24 Rack Single PODS MGHPCC as-built (20 PODs) MGHPCC maximum fit-out (33 PODs) Traditional data center, raised floor, max capacity 2 5 7 2 3 4 minutes. minutes. minutes. hours. hours. hours. " http://infraredimagingservices.com, image. Page 25 Outputs and file sizes: " TileFlow models and runtime data are approximately 500kB per model (.TLF). " CDF Image files are approximately 300kB per image (.JPG) (See Figure-10). " 30-second video clips are approximately 200kB per clip (.AVI). Figure 10: Sample Tileflow® CFD Image File (JPG). Computational Fluid Dynamic (CFD) Modeling Computational Fluid Dynamics is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the calculations required to simulate the interaction of liquids and gases with surfaces 12 defined by boundary conditions . In this thesis, the fluid is air. A color pallet will be used to represent the computed air temperatures (0 F), and arrows will represent direction and magnitude (vectors) of the air movement. Together, this multi-dimensional data will be presented in a two-dimensional plane called a billboard. A billboard is an analysis object available in the TileFlow software product. A billboard can exist in any axis (X, Y, or Z). Billboards will be used extensively to depict the airflow and temperature performance of a system/configuration under test. 12 http://infraredimagingservices.com, image. Page 26 Limits of Modeling Modeling is a tool. It is not a solution. It is a tool used by Data center Engineers to evaluate, predict, or validate a specific configuration, layout, or computing space. As such, there is an element to CFD modeling that is also "art", and subject to interpretation. That is to say, that the results of the models themselves are not solutions, but rather, representations. Model representations, as outputs of the CFD process, must be analyzed and interpreted to expose the maximum amount of information. Quantitative information is presented for analysis and qualitative interpretation. Quantitative data may indicate temperatures, static pressures, and vectors (directional) of airflow. Qualitative data may show that racks close to cooling and fan units receive less airflow due to reduced pressures as a direct result of increased airflow speed (Bernoulli Principle). Subsequent placement of high-density and low-density computing equipment may be a result of viewing and acting upon the qualitative data. Test Scenarios Before running large-scale models across the main data center floor area (30,100 square feet), a series of smaller tests should be performed to ensure that modeling can faithfully capture and replicate the thermal characteristic of the larger data center space. For these smaller modeling tests, the Sandbox area of the MGHPCC will be used. The Sandbox (see Figure-6) is a 640 square foot test environment specifically built for testing purposes. It utilizes the same equipment and exhibits the same cooling dynamics as the main data center area. It connects to the same Chilled Water (CW) cooling loop as the main data center. It also connects to the same electrical system. Therefore, all of the programming and Page 27 infrastructure set-points are therefore the same. The MGHPCC uses the In-Row cooling paradigm and is the focus of this thesis. The in-row cooling will be compared to traditional data centers using a raised floor. Test Setup The Sandbox area was configured using the same techniques employed in the main data center. The Sandbox has the same racks and POD enclosure system, but has a smaller footprint. All unused racks units were covered with blanking panels to prevent air leakage across the POD enclosure system. The airflow dynamic to be modeled is referred to as "frontto-back". That is to say- airflow enters the racks and computing equipment in the front of each rack and is subsequently heated by the computing equipment (or load bank), and the heated air exists the equipment and rack through the rear of the rack. When configured in this manner, aisles are created which carry either cold air or hot air. Such a configuration is referred to as hot-aisle or cold-aisle. Under no circumstances are hot aisle row exhaust air allowed to flow directly into the cold air inlets of an adjacent row. This misconfiguration allows heating and reheating of air as it moves from aisle to aisle. Computing equipment is always installed in opposing rows (pairs) to create and maintain the cold aisle / hot aisle cooling separation. Therefore, the test setup will maintain an enclosure surrounding a hot aisle. The industry term for this is "Hot Aisle Containment". A complementary configuration exists- "Cold Aisle Containment". A Load Bank will be used to simulate the heat load that would be generated by several highpowered servers. The MicroStar Load Bank will be configured to generate 17,500 watts of Page 28 heat. The use of a load bank allows fine-tuning of the heat load through a resistor ballast. In addition to changing the heat load, the fan speed may also be adjusted which affects the DeltaT (AT = temperature change from inlet to outlet). CFD Modeling Input Tileflow@ is an object-oriented CFD software program that permits drag-and-drop of highly customizable objects placed within a 3D modeling space. Each object can have a default or customized behavior. Since CFD modeling is an iterative process, there will always be modifications to the default settings if accuracy and fidelity are to be achieved. Complexity builds upon the most basic concepts- e.g. room dimensions. The physical boundaries not only include the X-Y dimensions of the room, but also the Z-dimension that includes the slab-to-slab distances. The Z-dimension must account for: a) the physical slab upon which everything rests, b) the raised floor plenum height, c) the working space where the IT equipment will reside, d) the drop ceiling used as a return plenum, and e) the physical roof which defines the top boundary for the above ceiling plenum. Each dimension will have an impact to airflow delivery and return. These values will define the characteristics for new construction, or must fit within an existing building's structure being fit-out as a data center. Objects are placed into the model representing objects of: heat generation (IT), heat rejection (cooling), boundaries to airflow, or obstructions to airflow. Objects are placed with a specific orientation, size, and value. Object interactions are subsequently modeled by Tileflow®. Page 29 CFD Modeling Output The review and understanding of the outputs generated by CFD modeling can be a daunting task for the novice and first-time users. There are a variety of outputs that are rich in information and meaning. It takes time to understand the meaning of this data, how it relates to other elements in this 3D space, how to make changes to achieve a desired goal, and how to maximize actions against a set of goals or objectives. The CFD engine is the science. The manipulation of the data is the art. Figure-Il represents a sample of an output of the CFD modeling process. This is a CFD model representation of the Sandbox area at the MGHPCC. The model depicts: an 8-rack computing environment with two in-row cooling units forming a hot-aisle, a POD structure to encapsulate this hot-aisle to form a hot-aisle containment system, a heat source (Load Bank) to simulate high powered computing equipment, and the cooling equipment used to remove/reject the heat. Note: One of the in-row coolers have been turned off and is represented with the red "x" symbol on top of the rack. Figure 11: CFD Modeling Output. Page 30 The image shown above is a modeling "cut" thru a thermal plane that had been set at 36" from this floor. This object is called a Horizontal Billboard. This billboard shows various elements of interest. Within the hot-aisle area, it is evident that elevated temperatures are present at the exit of the load bank. The colors represent a temperature scale with red indicating 1 10 F in the hottest region. Arrows indicate airflow movement as a vector. Each airflow vector indicates direction (azimuth) and velocity (length). The general room temperature is indicated to be 80 0 F, the hot aisle region 90 0 F-1 10 F with turbulent air mixing, and the high velocity air movement from the single in-row cooler exhaust into the room striking the boundary walls. A Horizontal Billboard, Z-plane, is just one example of a CFD modeling object. This thesis will also use Vertical Billboards in the X-plane and Y-plane. Those examples will be shown later as more complex modeling is performed. Test Results The output results from the TileFlow@ software package exhibited a high degree of correlation between the predicted CFD outcome and actual room measurements taken during the experiment. While the CFD modeling outputs are immediate, each test configuration was allowed to run for one hour to allow the room and physical infrastructure (equipment racks, etc.) to become temperature stable. Temperature stability is necessary so that all Infrared Thermal measurements taken would represent a steady-state thermal condition of the experiment. The one-hour test window was deemed sufficient (empirically) to eliminate unwanted thermal transients. Page 31 B A C Figure 12: Actual to Predicted Thermal Results (A,B,C). Sandbox Results: The results were excellent. (See Figure-12, Panels A thru C) Panel-A shows the predicted thermal performance with a horizontal billboard at the 36" level. Panel-B shows the same horizontal billboard with the physical racks and cooling equipment set to "invisible" to expose the details that are obstructed from a physical view. Panel-C is an actual thermal image using an infrared camera showing the heating effects of the racks across the hot aisle. The temperatures recorded by the thermal camera span 98'F to 113'F. This temperature range is consistent with the CFD modeling temperature scale on the horizontal billboard at the 36" level. Additional thermal measurements (actual) were taken throughout the room. The ambient room temperature was 80*F (aqua color), which is also the predicted temperature from the CFD model. Correlations were excellent. Shown below are various views (Figure- 13, A thru G) of the Sandbox CFD modeling output. Page 32 Figure 13: Various CFD Views and Billboards from the Sandbox Modeling. Confidence of CFD Modeling The CFD modeling of the Sandbox at the MGHPCC provided valuable insight into the use of hot-aisle containment PODs utilizing in-row cooling. Detailed inspection of the models found that the hot aisle containment region had a negative static pressure with respect to the surrounding room. This was counter-intuitive to earlier beliefs and by statements made by the Page 33 vendor. Airflow vectors pointed to the seams of the hot aisle structure- indicating air leakage into the structure. A simple test was performed: allowing end-of-row access doors to be pulled closed by negative pressure within a POD, indeed confirmed the negative pressure condition. Although this airflow anomaly was negligible in this test, it did expose the fact that airflow details could be predicted via CFD modeling in advance. The effort and painstaking attention to detail paid off. Chapter 3 - An Overview of Traditional Data centers Data centers have been in existence for 50 years. From the earliest days of mainframe computers through today's state of the art supercomputers, the data center has served the primary function of providing an operating environment for the computing equipment contained within its spaces. These environments have changed over the decades as new types of high performance architectures have emerged. Not all computer architectures are the same, nor are their operating needs. Therefore, the data center had to adapt and grow to meet the needs of the changing enterprise. A negative pressure condition was observed in the hot aisle containment area in the Sandbox test environment. This problem will be greatly magnified with the larger 22rack in-row cooled PODs on the main data center floor. This condition exposes a much greater problem rooted in the differences between the A-Ts of the computing equipment vs. the cooling equipment. This could result in a large pressure difference across inrow cooling airflow delivery throughout all PODs. This problem needs to be further explored and a solution found. The worst-case condition exists where highly efficient computers within a single POD are providing 26,400 CFM of thru-chassis airflow, against an in-row cooling system requiring 52,800 CFM for the same heat load. This is a huge problem requiring further analysis and a solution as heterogeneous A-T equipment share a common plenum structure (POD). 13 Page 34 Technology rates of change (percentage) were fairly small 20 years ago, and energy efficiencies were not an issue during these prosperous times. By today's standards, a 20-year old data center is a "legacy" data center. It was designed with the best business practices of 1994, addressing the computational needs of the mid-90s. Servers sat on shelves with only 46 servers per rack. That was about all of the power available from a 115 Volt - 20 Amp electrical circuit. Racks had a solid front door, a rear louvered door, and a top mounted fan to exhaust the hot air that rose inside of the rack. Racks were located, positioned, and oriented without regard to airflow or energy efficiency. In 10 years, that would all change. Today, computational and storage systems are so dense that 3-Phase 230 Volt circuits are delivered to each rack. This equates to 18,000 watts of power to each rack vs. 2,000 watts over a span of 20 years. Cooling also follows the same trend, as power and cooling go hand-in-hand. This begs the question: Is it possible to "future proof' a data center? Or, must we accept this as an inevitable consequence of the changing pace of IT systems and technology? Construction A legacy data center typically constructed with a raised floor consisting of 24" square floor tiles (2' x 2') which can be removed for access (Figure-14). The raised floors were typically 12"-24" high off of the building's structural concrete slab. This elevation height was large enough to provide access to all electrical circuits and permit the airflow required to deliver cold air to the computing equipment. Page 35 Figure 14: Example of a Data center Raised floor. Cooling systems and electrical distribution systems were mounted along the perimeter of the data center room. Floor loading (static weight loading) was not an issue as the steel rack typically weighed more than the equipment it housed. Operationally, the data centers were operated cold- very cold. Most had thermostats set to 68'F with cold air delivery at 54'F. Little was done to save energy. It was just the price of doing business and energy costs were simply rolled up into the overhead accounts of the business. PUE (Power Usage Effetiveness) Since its first publication in 200715 by The Green Grid, PUE has been a valuable tool and technique in focusing managers and engineers about the perils of run-away energy costs within large data centers. A fully loaded 10MW (87.6 MWh) data center will have yearly energy costs exceeding $8.76M dollars. Under the CFO's financial microscope are the http://www.edcmag.com/ext/resources/AugSep_2011 /computer-room-final-slide.jpg https://www.thegreengrid.org/~/media/WhiteThesiss/WP49PUE%20A%2OComprehensive%2OExamination%200f%2Othe%2OMetric_v6.pdf 14 15 Page 36 recurring monthly charges of $730,000 to the electric utility. These recurring monthly costs are very real and the executives of the company are asking, "How do we do more with less energy". The O&M tail is staggering. As previously stated, Power Usage Effectiveness (PUE) is an arithmetic fraction. In the numerator is the total amount of electrical power actually reaching the IT equipment. In the denominator is the total electrical power consumed by the site. Stated another way, PUE is a measure of how efficient the monthly electrical bills and energy costs are against doing actual IT. Competing Cooling Solutions It is time to introduce the two (2) competing cooling strategies discussed in this thesis. Each offering represents the latest state-of-the-art in cooling solution regardless of type. Downflow CRAH Unit A state-of-the-art down flow Computer Room Air Handler (CRAH) is shown in Figure-15. This unit has been configured to operate on a raised floor data center environment. This unit exhibits variable speed fan modules and ,variable flow rate Chilled Water (CW) metering valves to extract only the amount of energy required to maintain its set-point parametric programming (temperature, pressure, etc). The fan impellers used are the latest Curved Centrifugal Fan designs resulting in highly efficient air movement per unit of input power. These units may be configured and programmed independently or in clusters which can share the cooling load. Airflow is downward, taking inlet (hot) air from the top of the unit, thru its cooling coils, and exiting the bottom (cold) for subsequent delivery and cooling of equipment. Page 37 Figure 15: Down-Flow CRAH unit with Fan Module extended. Figure-16 shows a close-up of a CFD model showing a downflow CRAH unit installed on a raised floor plenum structure. Also shown are the above unit "hoods" which extend into the drop ceiling which serves at the hot air return plenum to complete the cooling loop. Figure 16: CFD model of downflow CRAHs with extended fan units. The technical performance' 6 of this downflow CRAH unit is as follows: Total Cooling Capacity: Fan Type: Total CFM: Programming: 16 181,000 Watts Sensible Cooling EC with Curved Centrifugal Fan 24,000 CFM/Minute Single Unit or Group/Cluster. Emerson Network Power Inc., SL-18056 Page 38 Return Air Fi er Front , Raised FFloor Blower lce Stand y Supply Air irSUpl Under-~lor Supy. PC Fans Figure 17: Downflow CRAH schematic of major components. Figure- 17 shows a side view of the downflow CRAH unit with the extended fan modules and overall airflow dynamic. The return air is connected to the drop ceiling via top-mounted hood. The supply air is delivered to the below-floor space created by the raised floor system. Cold air travels to the vented floor tiles as designed by the CFD model. In-Row Cooling Unit In-Row offers a different cooling paradigm then the downflow CRAH unit presented above. In-row cooling units are deployed in smaller units of total cooling with a horizontal airflow cooling design (Figure 18). Their airflow is 1800 opposite to the IT equipment that it intends to cool. Hot air appears at the rear of every IT rack. This is the in-row cooling's inlet is also located at the back of the rack. Stated another way- IT racks use a front-to-back airflow dynamic, while in-row cooling use a back-to-front airflow dynamic. This creates a circular airflow pattern from cold-to-hot and hot-to-cold. These circular airflow loops are only 10' Page 39 overall. The heat extraction and subsequent cooling appear very close to, and in many cases, touch the adjoining rack of IT equipment. Figure 18: In-Row cooling unit. The technical performance' 7 of this downflow CRAH unit is as follows: Total Cooling Capacity: 55,000 Watts Sensible Cooling Fan Type: EC with Curved Centrifugal Fan Total CFM: 6,900 CFM/Minute Programming: Single Unit or Group/Cluster. Because of the physical presence of the In-Row cooling units within the MGHPCC, capacity measurements were taken across its operating range. The plotted results appear to closely follow the Fan Affinity Laws18 for fan performance (Figure-19 and Table-1). 1 http://www.apcmedia.com/salestools/KKRZ-8MZQ4L/KKRZ-8MZQ4L_RI_EN.pdf IShttp://pontyak.com/fans/fanaffinitylaws.html Page 40 35M 800 300) 7000 6000 250D Power (Watts) 500D _ 2000 400D 150D 3000D 1000 200D -________ 0 0 20 40 60 80 100 0 20 40 60 80 10 0 Figure 19: In-Row Power (Watts) and Airflow (CFM) values of capacity. Symbols D = Q= Fan size, N = Rotational speed, P = Pressure Volume flow rate, W = Power, p = Gas density Fan Law 1 The first fan law relates the airflow rate to the fan rotational speed: Volume flow rate, Q, is directly proportional to the fan rotational speed, N. (QI/Q2)= (DI/D 2)3 (NI/N 2) Fan Law 2 The second fan law relates the fan total pressure or fan static pressure to the fan rotational speed: Total or static pressure, P, is proportional to the square of the fan rotational speed, N. (PI/P 2) = (DI/D 2)2 (NI/N 2)2 (PI/P2) Fan Law 3 The third fan law relates the total or static air power (and the impeller power), W, to the fan rotational speed: Power, W, is proportional to the cube of the fan rotational speed, N. (WI/W 2 )= (DI/D 2) (NI/N 2)' (P/P2) Table 1: Fan Affinity Laws The electrical power consumed by the in-row cooler appears to follow a cube root function (Fan Law 3). The CFM airflow is nearly linear (Fan Law 1: directly proportional) with a slight curvature attributed to airflow restrictions of the rack cabinet doors. Note- There is a minimum airflow setting pre-programmed at the factory. This value has been set to 40% and is not field adjustable. The resulting value is 2,960 CFM and equates to 500 Watts of electrical power to maintain this minimum setting. Page 41 CFD Modeling Traditional Data centers In this thesis, several CFD models were created to compare and contrast legacy/traditional data center cooling performance against the POD configuration with in-row cooling. The goal was to maximize cooling and airflow performance and compare the results. This broad modeling effort was to ensure that a False Maximum (Figure-4) was not the basis of conclusion. A broad range of CFD models were run while examining the quantitative and qualitative results of each model. This would allow the technology "clock" to be reset to 2014 and allow the merits of each configuration to be maximized and subsequently compared. The intent is to evaluate the performance of the configurations, not the age of the configurations. Sub-Models vs. Full-models Every large data center needs to be decomposed into its basic parts for evaluation and subsequent CFD modeling. A traditional data center is no exception. Domain decomposition of the problem set into smaller sub-models is critical for success. A "row-at-a-time" needs to be fully understood before a "room-at-a-time" can be modeled. This choice becomes obvious when comparing iterative models which run in 7-minutes vs. 4-hours. It is very easy to tweak single row/aisle configurations and explore their nuances before moving those behaviors into a whole data center model with n! (n-factorial) interactions. A Sub-Model is the smallest working atomic unit of a configuration that is meaningful to a data center. It has all of the properties of a full data center without the added complexities due to multiplicity. A Full-model has many instances of the sub-models repeated over and over until the room is fully populated up to the specified design. A full-model brings all of the Page 42 complexities of airflow and cooling together. This is where the final CFD modeling takes place. It is represents the most compute intensive element of CFD modeling. Compute times grow by a factor of 50x or more. Sub-model - Traditional Data center Figure-20 shows a sub-model of a traditional data center aisle with a raised floor and two rows of IT equipment racks configured in a hot/cold aisle configuration. There is no containment system to maintain hot air / cold air separation, nor is there a ceiling structure to help with rising heated air. This configuration was typical 20 years ago. The cooling is provided by downflow CRAH air handling units with sufficient CFM airflow to meet the demands of the IT equipment mounted in the racks. Figure 20: Traditional legacy data center, 20-year old approach. Results: Marginal/Failure at 12kw per rack. The model shown in Figure-20 is a basic building block of a much larger traditional data center model. Two (2) dedicated CRAH units are allocated in this configuration to provide cooling. While the results in this sub-model are marginal, the ultimate success in a large-scale Page 43 environment is low/nil. The leftmost racks are not receiving proper cooling to maintain their manufacturer's warranty. Thermal details indicate that the air inlet temperatures are 90 0 F+ or higher. The hot air is completely unmanaged and is mixing freely with cold air. Figure-21 shows additional problem details. Within the model, the racks have been placed in an "invisible" display mode to remove visual obstructions from the physical view. The cold air delivery, which passes thru perforated floor tiles (blue), only reaches the bottom one-third (1/3) of the IT equipment located in the rack. The remaining two-thirds of the airflow originate from swirling and heated air from the hot side of the equipment (green-yellow). This is a death-spiral of thermal cooling as the top-of-rack IT equipment ingests its own headed air across the top of the rack. Every server in every rack should have its demanded airflow (inlet) needs delivered by the cooling system. The airflow vectors do not originate from the floor, but rather from swirling and mixing within the volume of the data center's space. This model fails to provide this basic airflow dynamic for proper cooling. Equipment is at significant risk of overheating and a greatly shortened lifespan. Figure 21: Traditional legacy data center sub-model showing overheated racks/equipment. Page 44 Full-model - Traditional Data center Figure-22 is a whole data center model (full-model) using a traditional sub-model that is shown above. It full-model consists of: 1) A 36" raised floor, 2) Solid and perforated floor tiles, 3) Perimeter-located downflow CRAH units capable of 18 1kw of cooling, 4) Racks arranged into hot/cold aisles, 5) No containment system, 6) and no ceiling plenum return system of hot air. Shown in the figure are the Object Counts for this traditional data center CFD model. Note that the room size is 30,100 square feet and the server rack count is 620. The sizes and counts are comparable to the MGHPCC data center. o-R R.Wd Pete d e H 21) Ld Alow Ae Figure 22: A 30100 sq ft Traditional Data center w/Raised Floor. If this were a real data center the number of 2x2 floor tiles would exceed 7,500. There are also 800 perforated tiles to allow cold air from a 36" raised floor allowing airflow to the front of each rack housing IT equipment for cooling. The total airflow demanded for 12kw of IT electrical load per rack is 799,900 CFM. The air delivery system can only provide 672,000 CFM. This is in spite of the fact that the CRAH units are located along the perimeter of the data center, and their physical installation is almost end-to-end, with only a small gap between Page 45 them for service and maintenance. This represents a huge shortage in airflow and subsequent cooling. This airflow shortage occurs at 12kw per rack vs. 18kw per rack for the MHGPCC's in-row cooling approach. Therefore, a traditional data center of this size and configuration is not feasible without a fundamental revamp of the way cooling is provided. Additional CFD modeling is required before any conclusions can be made. Sub-Model with Containment PODs Sub-models are created to see if a basic atomic unit can be created and subsequently replicated across the data center floor. Figure-23 represents a sub-model of a POD structure within a traditional data center. This allows atomic units to be analyzed and modeled before full-models are run. If a sub-model is not successful, the probability of full-scale success is nil. Sub-model PODs will represent the basic building blocks of the full-scale models. Small changes can be modeled in minutes vs. hours. It is imperative that sub-models yield successful outcomes before proceeding. Complexities build over time. Figure 23: Traditional Data center with Containment POD. Page 46 Full-model with Containment PODs Figures 24 and 25 take a different look at the traditional data center cooling dynamics by incorporating a hot-aisle containment system. This is commonly called a POD design. By constraining all of the hot air and by venting it up into a ceiling plenum space, hot air exhaust does not have a chance to mix with cold air delivery. A physical boundary now exists between the systems providing cool air delivery and hot air return. The raised floor structure remains the sole path for cool air delivery, and a newly created ceiling plenum carries the hot air return. The POD structure creates the boundary between these two airflow systems. Figure 24: A 30, 100 sq ft Traditional Data center with Hot Aisle Containment. Close inspection (Figure-25) reveals that the hot aisle is completely encapsulated by structures to keep hot air contained allowing it to be vented into the ceiling space as part of the cooling cycle. Note that the tops of the CRAH units now have "hoods" which extend into the ceiling space to capture the collect hot air for cooling and subsequent sub-floor delivery. This overall cooling loop can exceed 150' in total cooling loop length. Page 47 Figure 25: Close-up view of multiple containment PODs. Figure-26 shows all of the objects in this full-model set to "visible". The shading-effect (color washout) noticed in the figure is the result of overlaying the drop ceiling structure. It clouds the image. Each ceiling vent is approximately 2' wide by 26' long. All of the hot air exhausted from the IT equipment is contained within the hot-aisle POD structure, prohibiting its ability to mix with cold air- reducing the overall effectiveness of the cooling system. The only path of hot air scape is vertically thru the ceiling vent and into the ceiling plenum for subsequent cooling by the CRAH units. For clarity purposes, the ceiling and the ceiling vents are set to "invisible" to remove the shading and color washout effects. The ceiling and ceiling vents remain and active in the model, but are not shown for visualization purposes. Page 48 Figure 26: Full object detail showing the ceiling and ceiling vent above the hot-aisles. Figure-27 depicts a horizontal billboard set approximately 48" off the raised floor structure. The ceiling and ceiling vents have been set to "invisible". This will eliminate the masking of the color pallet showing the temperature billboard in detail. Figure 27: Horizontal Billboard of a full-model of the data center (12kw load). For the first time we see the thermal performance of the data center. All of the work so far is to get to this point. The horizontal billboard in Figure-27 is rich in information and thermal performance. It is evident that the POD structures are indeed maintaining hot and cold Page 49 separation and that the servers are only receiving their demanded cooling from the vented floor tiles. Cool air (blue) is present at the front of each and every rack. Hot air (red) is captured within the POD enclosure surrounding the hot-aisle. Figure 28: Vertical Billboard of the data center (12kw load). Figure-29 shows a vertical billboard of the model. Evident are the cooling effects to each and every rack of IT receiving cool air (blue). The cool air completely fills the cold-aisle, thus ensuring that the IT equipment is being cooled. Also evident are the effects of the ceiling vents above the PODs. The airflow indicated by the model show 90*F air being drawn into the ceiling plenum with great velocity. The hot air is captured within the ceiling plenum allowing the downflow CRAH units to take this hot air, cool it, and deliver cold air into the raised floor structure. Figures 29 and 30 indicate that no matter where the Vertical Billboards are placed the thermal performance is maintained. Figure-31 depicts the subfloor pressures and directivities. Page 50 (D eJ(Q 0 Cdf YN V0 XCd VV-Il VN x~ CD) CD) CdC) CdC) CdC2 CL) Conclusion - Full-model Reduced Output (12kw vs. 18kw) - After many iterations, modeling has unable to design a successful CFD airflow results that exceeds 12kw per racks on a traditional data center raised floor structure. To get to this 12kw power level required rack configurations to be orientated into rows of hot-aisle/cold-aisles. Spanning the hot-aisles were containment structures to manage the airflow separation preventing hot/cold airflow mixing. Also critical for success was the use of a drop ceiling serving as a common hot return air plenum. The limits of this model appear to be the number of downflow CRAH units which can be installed across opposing walls of the 30,100 square foot data center. There appears to be insufficient airflow (CFM) to support electrical load in excess of 12kw per rack. The purpose of this traditional data center full-model was to create a CFD model across the 30,100 sq ft data center floor able to support rack electrical densities of 18kw per rack. Due to the limits of airflow delivery stated above, modeling was unable to achieve that goal. After numerous iterations of submodels and full-models, an 18kw solution was not found. The highest electrical load per rack was 12kw- a 33% decrease from the intended goal of 18kw per rack. Raised Floor Cost Considerations A raised floor requires a large up-front CAPEX expenditure. As the economic performance measures continue in this thesis, the reader is again reminded that a raised floor, drop ceiling, and perimeter located downflow CRAH units are part of a traditional of data center model. These components are unnecessary with in-row cooling which are mounted on the concrete slab. This is a significant cost factor when compared to installing a 36" raised floor and drop ceiling covering 30,100 square feet. A raised floor must also be properly spec'd to support Page 52 the weight of the IT equipment and supporting racks (static load) and the rolling load as equipment is moved across the data center floor. Figure-32 shows these values. 19 SYSTEM PERFORMANCE CRITERIA (Tested on Actual Understructure) ROLLING LOADS STATIC LOADS Safety ConCore 1000 Bolted Strng"r ConCore Salted Stringer 1500 1000 lbs lbs/ft2 10 lbs/ft oIlted 1250 Factors* (m 2.0) 9.0 9.2 Stringer ConCore Design Loads* SYSTEM WEIGHT U/S Panel 2 49kg/r 10. lbs/ft~ bs m2 125016. ConCore 2500 ConCore 3000 2 lbs 1500 50 680kg 2 Stringer 9bn 2 d Stringer l 2 64 kg/rn 2 600 300 1361 0 lbs 150 lbs PASS lbs 1210 55 1000 lbs 454 kg 150 lbs PASS 1500 lbs 1250 lbs 567 kg 200 lbs 91 kg 567 kg 60 kg - lbs l kg [ 68 KG 68kg 0 b 2 1134 kg 2 lbs 15fb 150 lbs 363 kg e 2500 lbs 272 kg kg 907 kg kg/rn 12. 0 Bolted 800 lbs IMPACT LOADS 454 kg 11.5 Stringer [Irlr56 Pass.s 1000 567 kg Bolted 2000 10,000 PASS 4 k 2 10 Pass.s FO PASS 1sF0b s 2 907 kg 907 kg lbs 2400 lbs 1225 kg 1059 kg 2700 68 kg 91 kg lbs 91 kg 200 Figure 32: Raised Floor loadings. A 2x2 concrete-core floor tile loading of 1,250 lbs/sq ft is a typical product used in data centers where raised floors are used. A 2x2 floor tile is therefore capable of supporting a point load up to the rating of the floor tile. Typical floor ratings include: 1000, 1250, 1500, 2000, 2500, and 3000 lbs of point load strength. These tiles are typically de-rated by 20% to 50% if positioned in high traffic areas subject to rolling loads where equipment is installed and removed. With an economic life of a raised floor spanning 20 years, the number of rolling "passes" must be taken into account. Raised floors must also take into account any seismic activity with special sub-floor bracing. The taller the raised floor is, the greater the required protection. 19 http://tateinc.com/products/concoreperformance-chart.aspx Page 53 Figure 33: MGHPCC on-slab installation of PODs. Figure-33 shows an on-slab installation of PODs vs. raised floor. The concrete is listed at 30,000 psi tensile strength. Static and rolling loads considerations are minimized. Raised Floor Costs While costs for any construction project can vary greatly, the following information is provided as an "awareness of cost", not an "estimate of cost". Times of economic boom and bust make it impossible to accurately predict costs at a single point in time without an estimate costing thousands of dollars. On-line Internet cost estimators are used in this thesis as a ROM (rough order of magnitude) for these costs. With delivery and other miscellaneous costs, the total CAPEX cost to install a raised floor is approximately $1,000,000 dollars. See Figure-34. Page 54 Part Number Item CK24 ConCore® Kit inches high IGPW Infinity Air Grate 55% Perforated Panel for Woodcore Unit - 24 - per square foot Quantity Price Each Total 30100 $17.95 $540,295.00 each 620 $198.95 $123,349.00 each 620 $79.95 $49,569.00 Materials $713,213.00 Free Levelers K101OU KoldLok 9 Integral 1010 - Unit Total Installation $225,750.00 Total $938,963.00 Figure 34: Raised floor cost estimate. Raised Floor O&M Costs A raised floor structure requires periodic cleaning. The cleaning process is twofold. First, the topside of the floor tiles must be cleaned using non-dust generating methods. Second, the floor tiles must be removed and the sub-floor slab area vacuumed. These costs are required as the raised floor is part of the airflow delivery and must remain particulate free. An enterpriseclass data center will perform these tasks yearly. Service rates of $0.30 per square foot 20 yields $9,030 per year ($750/month) for the raised floor O&M. Drop Ceiling Plenum Structures Drop ceiling cost 2 ' (estimates) require 7,525 dust-free tiles ($1.50 per square foot), frames to hold the tiles (approx. $2.50 per square foot), 1,700 vented ceiling tiles22 (approx. $9 square 20 21 2 FloorCare Specialists, Inc., email dated 4/28/2014 http://www.newceilingtiles.com http://www.alibaba.com/product-detail/Guang-zhou-kaysdy-series-perforated-metal_1 522304355.html Page 55 foot), and the labor to install the drop ceiling (approx. $6 square foot for licenses/bonded installers). This yields a finished CAPEX cost of approximately $362,000. Drop Ceiling O&M Costs Similar to a raised floor, drop ceiling structures also require yearly cleaning to maintain a dust-free plenum. There may be a premium for this service, as ladders are required to access the above ceiling space. Drop ceiling cleaning costs $0.65 per square foot 2 3, or approximately $19,565 per year ($1,630/month). Chapter 4 - CFD Modeling In-Row Cooling with Containment PODs In-Row cooling with containment structures called "PODs" is a fairly new data center design, especially when used together. Incremental transitions from legacy data centers first used containment systems to correct hot and cold airflows while retaining a raised floor and perimeter located CRAH units. The addition of in-row cooling presented a new approach to an old design. The goal was to provide cooling as close to the heat source as possible. Instead of a cooling loop traveling 150 feet or more, the in-row approach reduced this cooling loop to only 10 feet. The total distance where airflow mixing can occur is minimized and is further reduced with the application of a containment system creating independent PODs. The resulting POD approach allows the flexibility to configure PODs to run at different electrical and heat loads. IT equipment can be separated and clustered together that share a common Delta-T (AT). On paper this looks like a win-win situation for energy efficient data 23 FloorCare Specialists, Inc., email dated 4/28/2014 Page 56 centers. Because an 18kw per rack data center is not available at the time of this research, reliance on CFD modeling will be used to simulate these data centers. The success of the Sandbox CFD modeling was therefore paramount as a driving factor and basis for decision for future work. The cost of IT to fill one rack to 18kw of electrical load would vary between $126,000 and $600,000 per rack. CFD Modeling - In-Row Cooling with PODs The same modeling methodology is used with in-row cooling with PODs. The most basic configuration that is meaningful to the data center is modeled before the entire data center is modeled. The most basic unit of configuration is a single PODs using in-row cooling. Figure-35 shows this design. The POD configuration used at the MGHPCC consists of 20 and 24 racks with in-row coolers placed between racks of IT. This pattern is repeated throughout the MGHPCC data center. This configuration uses hot-aisle containment and a POD structure containing the hot air. There are no exhaust vents in the top this configuration as there are no ceiling vents and no ceiling structure. All equipment is mounted on the concrete slab without the need of a raised floor to support cooling loop. IT equipment racks typically has 42U "units" of vertical space available to mount IT equipment. A commodity-priced I U server with an associated cost of $3,000 equates to $126,000 per rack when 42 servers are installed. If three highly-specialized compute intensive IOU Blade Servers are installed ($200,000 each), the resulting costs can exceed $600,000 per rack. 24 Page 57 Figure 35: Basic POD design utilizing in-row cooling. The in-row coolers take hot air from the hot-aisle, cool it, and pass the cool air into the open spaces surrounding the POD. The racks of IT equipment simply take this cool air and use it to cool the internal equipment at a rate of 18kw per rack. The IT equipment maintains a front-toback airflow dynamic, while the in-row coolers use the opposite, back-to-front. This is very different from the traditional data center design. CFD Modeling - POD Sub-model Figure-36 shows the CFD modeling result of a single PODs thermal performance operating with an 18kw load per rack. The in-row coolers have cooled the surrounding air to 80'F, which is the programmed set point of the in-row cooling unit. Every IT rack in this CFD model received an 80'F inlet air temperature regardless of position or location across the POD. Page 58 Figure 36: PODs with in-row cooling horizontal billboard. In this sub-model the hot air is completely contained within the hot-aisle POD containment structure. This is a significant improvement over the traditional data center using perimeter located CRAH units, raised floor, and ceiling plenum structures. Figure-37 shows the same thermal image with the racks set to invisible to eliminate the visual obstruction of the horizontal billboard. Hot air is completely contained within the POD structure. This is a very successful CFD model at the 18kw power level. Figure 37: POD with in-row cooling horizontal billboard. Page 59 Figure 38: Vertical billboard, in-row cooling system highlight. Figures 38 and 39 show vertical billboards sweeps across the POD. The outer most two billboards are located at IT racks. The hot-aisle shows heat at the rear doors of the racks and airflow vectors pointing into the front of each rack. The middle billboard is located at an inrow cooler. Hot air collects at these inlet points to be cooled with the cool air exhausted out into the open spaces at great velocities (long airflow vectors). The open air remains at 80*F, which is the programmed cooling set points for the in-row coolers. This is a highly successful model running at 18kw per rack of IT equipment. Figure 39: In-row cooling with vertical billboards. Page 60 CFD Modeling - Full-model POD Following the same methodology from sub-model to full-scale modeling, the following CFD model represents the as-built data center at the MGHPCC using in-row cooling with hot-aisle containment PODs. The MGHPCC data center model shown is the as-built configuration. Two-thirds of the room have been built, with the remaining spaces available for future buildout. Figure-40 shows the main data center floor of the MGHPCC. It contains many instances of the sub-model replicated across the data center floor. The limits of thermal performance of single PODs have been modeled previously in the sub-model section above. This full-model CFD will determine if there are any thermal or performance sensitivities between rows of PODs. Figure 40: Full data center model, POD design with in-row cooling. Figure-41 shows the results of the full-model CFD modeling run. The heat generated by IT within the racks is properly constrained within the POD hot-aisle area and properly cooled. The resulting air discharged from the in-row coolers maintains an ambient temperature of Page 61 80F. All racks have air inlet temperatures that maintain equipment warranties and are consistent with in-row cooling set points. Figure 41: CFD data center model, horizontal billboard. Figure-42 shows the POD and rack structures set to invisible to remove physical view obstructions. At 18kw per rack, the hot-aisle temperatures are successfully contained within the POD structures. Figure 42: Full-model, horizontal billboard. Page 62 Modeling Results - Full-model, In-Row cooling with PODs The thermal performance shown in Figure-41 (full-model using in-row cooling and hot-aisle containment PODs), indicate a very successful CFD modeling result. There does not appear to be any thermal sensitivities nor interactions between adjacent POD structures. Each POD operates independently and autonomously from neighboring PODs. The performance is outstanding at 18kw per rack. Staggered In-Row Cooling - What If One of the questions that frequently arise is one of symmetry and balance within each POD. That is to say, racks and in-row coolers are placed opposing each other with the hot-aisle containment structure within a POD. This means that the heated exhaust air from each rack blows across the aisle directly into the opposing rack. Hot exhaust blows onto hot exhaust. What if the racks and in-row coolers were "staggered" so that exhaust air would travel into the air inlets of the in-row coolers. It would be very hard to do this as an experiment with actual equipment- especially at these heat loads (18kw). Therefore, the confidence gained with CFD modeling should be able to accurately predict the performance of this configuration. Figure-43 shows a CFD model where the racks and in-row coolers have been staggered by one position. The same 18kw heat load was applied to the model along with the same hotaisle configuration and POD containment system. Figure-44 indicates a successful airflow and cooling solution. Page 63 Figure 43: Thermal Performance of Staggered In-Row Cooling. Figure 44: Stagger design, horizontal billboard. Results - Staggered In-Row Cooling There does not appear to be any performance benefits with staggering in-row cooler units with respect to their IT heat source. In fact, the physical assembly of the containment system is no longer a perfect rectangle. The upper right-hand corner (Figure-44) is 12" shorter in length than the opposite side. This results in a slanted entry/exit door structure. With no obvious Page 64 improvement in thermal performance over a standard symmetrical POD, the staggered configuration does not offer any observable benefit. In fact, the assembly problems and potential air leakage may sway the decision-making process away from this approach due to mechanical assembly alone. Therefore, I do not suggest a staggered in-row cooling POD in this configuration. There is no economic benefit to a staggered in-row cooling design. Chapter 5 - Observations - CFD Modeling The use of CFD computer modeling has proven invaluable to the technical analysis as the process now begins to evaluate the economic benefit of In-Row cooling vs. Traditional DownFlow data centers with raised floors and ceiling plenum structures. Evident in the CFD modeling was a 12kw limit for traditional data centers with peripherally located down-flow CRAH units. An improved solution was found by using in-row cooling resulting in a power increase to 18kw per rack. This represents a 50% increase in power over the 12kw value. This would allow data center managers the opportunity to add additional servers or allow higher-powered equipment within the same rack footprint. Unused electrical power could be reserved for future upgrades as part of the next round of re-capitalization at a rate of once every 4-5 years (technology/budget/requirement driven). Economic Impact Neither solution of cooling topology is "free" with respect to its competing technology. There are always trade-offs. In-row cooling actually takes up valuable POD space on the data center floor. This is where billing and revenues are sourced. The goal is maximize billable Page 65 footprint as a profit center, or minimize costs to the business if this is a cost center. A 24 rack POD with ten (10) in-row coolers occupies approximately four (4) racks worth of "billable" IT floor space. Depending on your cost model, these impacts must be taken into account as the Total Cost of Ownership (TCO), total cost of operation (OPEX), and initial Capital Expenses (CAPEX) are evaluated. This can be significant. Just as in-row coolers occupy valuable data center floor space, down-flow cooling also have their own issues that can impact billable revenues. Both CFD modeling and actual observations confirm that the speed of the airflow from down flow CRAH units is moving so fast that the Bernoulli Principle 2 5 of speed/pressure has a negative impact in the near-regions of the CRAH units. The faster the airflow, the lower its pressure. This means that the vented floor tiles closest to the CRAH units have: 1) Reduced airflow, 2) No airflow, or 3) Negative pressure (air actually goes into the vented floor tiles). Figure-45 shows this effect. The first vented floor tile is 16' from the closest CRAH unit with its airflow almost one-half of other vented floor tiles. The data center's boundary wall is 22' from this floor tile. 25 In fluid dynamics, Bernoulli's principle states that for an inviscid flow an increase in the speed of the fluid occurs simultaneously with a decrease in pressure or a decrease in the fluid's potential energy. Daniel Bernoulli Hydrodynamica, dated 1738. Page 66 Figure 45: Floor Tile airflow vs distance to CRAH unit. The first vented floor tile is 16' from the closest CRAH unit with its airflow almost one-half of other vented floor tiles. The data center's boundary wall is 22' from this floor tile. This is a significant amount of data center floor which is not supporting a billable IT revenue cost model. These effects are mirrored on opposite sides of the data center as show in the model (Figure-44). Qualitative analysis (the art of CFD modeling) would suggest or offer the following recommendations: The center aisle of the data center floor is where the highest- powered IT equipment should be located, as this is the region of highest airflow. The two side aisles can be a mix of lower-powered IT equipment. The tiles indicated by the lowest airflow (red/orange/yellow) can be reserved for network patch panels and other very low-powered equipment. This allows the entire data center to address the idiosyncrasies of airflow to be maximized. This is a prime example where CFD modeling is a tool and not a solution. It also exemplifies the differences between quantitative and qualitative review (science vs. art). Page 67 Chapter 6 - Economic Summary With the extensive CFD modeling effort and analysis complete, it is time to focus on the costs and cost differences between the two cases presented. The original hypothesis was that InRow cooling would offer a lower up-front cost model (CAPEX) and a lower recurring operating expense (OPEX) over the economic life of a data center- assumed to be 20 years. Iterative Economic Design Traditional management methodologies for design would favor either a Top-Down or BottomUp design. Neither of these methodologies should be used exclusively for data center design. A spiral approach using an Iterative Design is the only way to maximize the cost-performance product of a data center. A data center rides atop the laws of physics, electrical codes, fire codes, and local/municipal codes, etc., which can work against your design. As the data center matures and reaches 80% of it stated capacity, capacity planning will discover this mismatch. An example of this is rooted in the National Electrical Code (NEC). The electrical power delivered to computing equipment is listed as a "continuous load" per the NEC 26. A traditional electrical circuit in a commercial electrical system (Enterprise Class Data center) is a 208 Volt circuit with 30 Amps of current. The NEC requires this circuit to be de-rated to 80%, or 24 Amps of continuous duty use. Therefore, the available power from this single 26 For safety reasons, the National Electrical Code (NEC) does not allow electrical circuits to be operated at 100% maximum current capacity for systems running "continuous loads" (>3 hours) without de-rating the circuit to 80% of maximum. Page 68 circuit is 5,000 Watts (Power = Voltage x Current). This demands that the power and cooling requirements be adjusted to multiples of 5,000 electrical watts and 17,050 BTUs of cooling. Any other design configuration will always create stranded capacity. Ideally, the data center design should consume space, power, and cooling at the same rate- leaving no stranded capacity. It is up to the data center floor managers and data center engineers to ensure that resources are consumed at the appropriate rates so that capacity management remains meaningful. Rack Power Density The question "how much can we put into each rack" comes up frequently. This question is best answered by the CIO and CTO teams who are managing the IT baseline and IT strategic futures. The total site power multiplied by the PUE will determine the amount of electrical power available for the IT equipment. In this thesis, a total site power is assumed to be 10MW with a corresponding PUE of 1.5 or better. This yields Power Usage Efficiency of 66%. Therefore, 6.67MW of the 10MW electrical power is available for IT, with an additional 3.33MW of power for cooling, pumps, HVAC, lightings, etc. In this example, if a data center had an initial design requirement for 750 racks of IT equipment, the average allocation of electrical power is 8,900 watts of power, or 8.9kw (6.67MW + 750 racks). The electrical value of 8.8kw is not a multiple of the 5kw electrical power example given above. This is a mismatch of available IT power to rack power density. The next greatest multiple is 10kw or 15kw. The number of racks needs to be reduced to raise the amount of average electrical power available to each rack. The resulting design would suggest that the rack count should Page 69 be reduced to 667 racks for a 10kw per rack electrical density. At 15kw per rack would require only 444 racks. The secondary effect of this type of planning/design is the reduction of the floor space required. A data center with 750 racks would require 30,000 sq ft, while a data center with 444 racks requires only 17,760 sq ft. Number of IT racks Based on years of empirical data, modeling, and data center design, the theoretical maximum number of IT racks which can be placed on a 30,100 sq ft data center floor is approximately 750. This is based upon numerous data centers that exhibit a rack density of 40 sq ft per rack. While a rack physically occupies 7-9 sq ft in actual footprint, the "white spaces" across the data center, such as hot-aisles, cold-aisles, cooling equipment, electrical equipment, ramps, fire-code egress pathways, and maintenance areas, etc., need to be averaged across the data center. In this thesis, 40 sq ft per IT rack will be used for enterprise class data centers. Iterative Savings The above example would result in 306 less racks purchased, 12,240 less sq ft of data center floor space, fewer PODs purchased, less overhead lighting, fewer sprinkler systems, fewer smoke detectors, fewer security alarms/cameras, reduced installation labor, reduced commissioning, etc. If the 306 racks were configured in a POD configuration with networking and storage fabric, the resulting initial CAPEX savings can easily exceed $12M dollars. It is paramount that data center experts work collectively and cooperatively with CIOs and CTOs to drive the design aspects of any data center project. This highly specialized activity is completely foreign to designers of commercial or industrial facilities. After five Page 70 years of data center operation should never result in the spoken phrase- "...if we had only known..." PUE Hacking Power Usage Effectiveness has grown in popularity and has become a generally accepted metric for data center power efficiency. There are those who would accept the fact that a PUE of 1.4 is better than 1.5. I would argue that PUE is simply a measure, not a metric. It is a fraction of power ratios- not a measure of the efficiency of those ratios. I offer the following experiment to clarify this position. PUE is a fraction. As a fraction, it is possible to artificially influence the numerator or denominator of this fraction. This artificial influence seems have a positive outcome in that it always depicts a better PUE value. I would argue that PUE can be hacked, and therefore warrants further investigation to quantify the impact. Energy efficient servers have variable speed fans to provide cooling over a broad range of operating parameters. One of the parameters is Inlet Air Temperature. This is the inlet temperature of the cold-aisle within the data center. The servers will increase or decrease their internal fans based upon many sensors ensuring that components do not overheat. The warmer the inlet air temperature the faster the internal cooling fan speeds become to maintain safe cooling. Affinity Fan Law #3 states that fan power follows a cube root function. That is to say- if you double the fan speed (2X), you need eight times (8X) the electrical power. With servers having 4-8 fans, this can be substantial, especially across the entire data center. Page 71 This experiment was performed because of a power consumption graph 27 (Figure-46) that has circulated the Internet for many years. I wanted to check the validity of this 10+ year old graph and compare it to my results using 2014 technology. Evident in Figure-46 is a sharp increase in server Fan Power at 25'C (77'F). 70 '70 60 50 60 50 40 40 30 30 20 20 10 0 10 0 10 15 20 25 30 35 40 Int Temperature C Fan Powj Figure 46: Server Fan Power vs. Inlet Temperature (circa 2004)28 An experiment was setup to purposefully increase the air inlet temperature of a highperformance server and observe the resulting power increase over the temperature range. The intent was to mimic the test environment used to produce the graph shown in Figure-46. If the increase in server fan speed occurs at such a low temperature (25 C/77 0 F), the impact across a data center could be substantial. It also means that vendors can artificially influence PUE as part of a proposal or during acceptance testing. This sensitivity required further testing. For the purpose of this test, I am assuming at 10MW data center running at 80% of its stated maximum capacity (8MW) with a PUE of 1.5 as a test case scenario. This equates to an 8MW total site power with 5.33MW available for IT, and 2.67MW available for cooling and electrical infrastructures. 27 28 http://cdn.ttgtmedia.com/ITKE/uploads/blogs.dir/72/files/2009/0 1/temp-fan-speed.gif http://tc99.ashraetcs.org/documents/ASHRAEExtendedEnvironmentalEnvelopeFinalAug_1_2008.pdf Page 72 Watts Temp 81 Watts155 85 168 86 169 88 172 200 90 91 170 170 150 93 171 200 95 172 250 234 235 50 101 235 0 104 235 97 99 -------------- -- --- ~-Wats -___ --- so 85 90 95 100 105 Figure 47: Server Electrical Power vs. Inlet Temperature (2014) Figure-47 shows the results of a temperature sweep of a 2U server from 81 F thru 104*F. Between 950 and 970, there is a pronounced increase in server power attributed to cooling fan power. This is approximately 65 watts of increased power. A data center with 356 racks (80% of a 444 rack maximum) can hold 7,120 of the 2U height servers. The resulting power increase is 463kw. The 463kw as a fraction of the total 5.33MW IT power budget is therefore 8.7% of the power available. Since this 463kw is part of the IT technical load, it also requires cooling to reject the heat the fans themselves generate. A heat source of 463kw requires 132 tons of cooling. If a fully loaded 10MW data center exhibits a measured PUE efficiency of 66% (1.5) and artificially inflated the numerator and denominator by +8.7%, the resulting PUE would indicate an apparent improvement. Managers are perplexed that the PUE has improved yet the electric utility bill is higher. Additionally, the data center's IT electrical technical load is the most expensive electrical pathway. sourced, and generator backed-up power. It represents fully filtered, UPS Care and scrutiny are required for anything connected to this pathway. Page 73 Table-2 shows a data center running at an 8MW technical load (8,000kw of 10,000kw maximum) with an assumed design PUE of 1.5 (5,330kw for IT power). If 463kw of fan power were added due to an elevated cold-aisle temperature set point, the resulting PUE would indicate an improved PUE- yet the Total Site Load has gained 463kw. If the electrical power savings from raising the temperature set point does not exceed the added cost of increased fan power, there are no actual savings afforded- yet the PUE may indicate otherwise (causation vs. correlation). The recurring monthly electric charges for the increased fan power alone are $33,300 monthly. These costs are higher when cooling costs are added. Changing data center temperature set points is a decision rooted in several interrelated systems. Extreme care is required for lower OPEX. PUE Hacking Assume: 80% of a 10MW datacenter design 8,000 Total Site Power 5,333 IT Power 1.5 (66%) Beginning PUE 463 Added Fan Power kw kw kw Formula PUE=IT Power/Total Site Power Numerator 5,333 5,796 Denominator 8,000 8,463 PUE as a Percentage 0.6667 0.6849 PUE as a Number 1.50 1.46 --------------. L---------------------------------- Table 2: PUE Influenced by IT Fan Power PUE Hacking Conclusion - A Cautious Yellow. The above figures are displaced by 10+ years in time and technology advancement. The results of this single server test suggest that elevated data center temperatures near 95*F are not practical- however, not all IT equipment Page 74 in the data center will follow this temperature/power curve. Each vendor will have a unique temperature-to-fan speed algorithm based upon equipment thermal design and Delta-T (A-T). Both PUE and absolute electrical power readings are required to determine the proper operating set points resulting in the lowest possible OPEX (Operational Expenses) with respect to electrical power and efficiency. The cooling system will also have a defined range of efficient cooling set points that must be followed for low cost operation. A data center is a system of systems. Deviations from the final commissioning agent (CxA) report must be performed with caution while a focusing on total energy costs and unintended consequences to the surrounding systems. Chapter 7 - Technical Summary It is time to summarize the technical aspects of the CFD modeling that was used as the basis of comparison of two different cooling methodologies - In-Row cooling vs. Raised Floor/CRAH cooling used in traditional data centers. Thirty days of on-site research at the MGHPCC facility and approximately 100 CFD models using TileFlow@ were performed as a technical foundation for this thesis. Recall the modeling efforts so far: * Actual measurements and CFD modeling of the MGHPCC Sandbox area to validate the use of CFD modeling on known configurations and heat loads. The CFD modeling was highly successful in predicting the thermal performance of hot-aisle containment PODs. Page 75 " CFD modeling was performed in two stages. First, sub-models were created to understand the atomic behavior of single POD configurations or single rows before whole-room CFD models were run. Sensitivities to heat loads were explored in submodels before moving to the larger full-models. Second, full-models were run to observe the interactions between PODs at a macro level. Success at the sub-model is not a guarantee of success at full scale. Sensitivities at the macro level would warrant further understanding and problem decomposition to expose these hidden behaviors. None were observed. " Various CFD models of In-Row cooling using the MGHPCC defined hot-aisle POD configuration and temperature set points. Each model increased the power density and the subsequent heat load until the model failed to resolve a successful cooling solution using in-row cooling and POD containment methodologies. One additional "what if" model was run using a staggered in-row cooling placement to determine if an improved cooling solution could be found. Staggered in-row cooling offered no cooling improvement over the symmetrical in-row cooling placement. No further study was warranted. * Various CFD Models of Raised Floor cooling with perimeter-located downflow CFRAH units. Several traditional data center models were run to extract the maximum cooling potential using a 36" raised floor as the delivery path for cooling. Models were created with and without containment systems- requiring the installation of a drop ceiling structure to capture the hot air. With each model, the power density was increased until the model failed to resolve a successful cooling solution. Page 76 Observations Observation #1: The first observation is that the containment systems used to create hot-aisle POD configurations showed no interaction or sensitivities from sub-model to full-model. That is to say, modeling at the sub-model level is as good as a full-model. It was as if the MGHPCC POD design could be viewed as 33 independent data centers across a common 30,100 square foot area (3 rows of 11 PODs). Each POD appears to operate autonomously from every other POD. As long as the in-row coolers can discharge 80OF cold air, the models were deemed a success. This was true for every case. Observation #2 Traditional data centers without containment systems represented the worst airflow dynamic solution of all CFD modeling tests performed. Hot air mixes with cold, thus spoiling the effectiveness of the entire cooling system. Warm air drifts randomly back to the hot air intake systems of the CRAH units. Rack power densities of 8kw were observed before failure. Without the use of containment systems 8kw per rack appears to be the limit of cooling for this configuration. Observation #3 Separating hot and cold by using a containment system resulted in a significant improvement in cooling effectiveness over non-contained configurations. However, the traditional data center now requires the installation of a hot air return plenum in the form of a drop ceiling. Page 77 The contained hot air must be contained the entire path back to the CRAH cooling units. The above-ceiling approach is one way to accomplish this task. Rack electrical loads have increased from 8kw per rack to 12kw per rack. This 12kw limit appears to be rooted in the lack of airflow provided by the CRAH cooling units. The models have as many CRAH units installed as possible per unit length of perimeter wall space. Observation #4 A rectangular shaped 30,100 sq. ft. data center room is too large of a single space for CRAH based cooling. The ratios of data center square footage floor space to perimeter wall lengths are insufficient to install the correct number of CRAH units. CFD modeling indicates a significant short fall in airflow delivery and subsequent cooling. The size and geometry must be adjusted to maximize the cooling performance of CRAH based data centers. Another series of models must be run to determine the maximum data center size and geometry. This observation was not seen with in-row cooling as the POD designs appear to operate autonomously as self-contained solutions. POD to POD interaction was not observed. Observation #5 The POD design incorporating in-row cooling exhibited the highest rack power levels of all models tested. Rack power levels were 18kw per rack with a successful model outcome. All in-row coolers provided air discharge at 80'F. This was deemed a success. Page 78 Observation #6 A 24 rack POD can consume 432kw of electrical power (24 racks x 18kw). This requires 123 tons of cooling, and depending on the Delta-T, requires airflow rates ranging from 34,50069,100 CFM. Each in-row cooler can provide 6,900 CFM total airflow delivery; therefore, all 10 units are needed for each POD operating at maximum airflow capacity. Airflow demands are lower if the IT equipment being cooled is designed for a higher Delta-T (A-T). Observation #7 The In-Row coolers within each POD (10 per POD) were programmed to act as one cooling unit. The CFD models were built to mimic this programming model. All in-row coolers had the same fan speeds and cooling settings regardless of the locality of heat load. This would result in all 10 in-row coolers responding equally to an asymmetric heat load at one end of a POD. This may not be the most economic programming model available. Additional modeling and in-row cooling reprogramming may expose addition cooling benefits and associated cost savings. Observation #8 The CFD models were allowed to run unconstrained with respect to total site power and PUE. The CFD focus was on cooling. The CFD models include 308 in-row cooling units across 33 PODs, spanning the entire 30,100 sq. ft. data center floor (3 rows of 11 PODs). The resulting capacity can cool an IT load of 21.677 MW. This greatly exceeds the 10MW total site power assumption, exceeds the cooling capacity of the cooling plant, and exceeds the 6.6 MW of IT power available under a PUE of 1.5. Figure-48 shows an in-row cooler count of 308 which Page 79 was used in the full CFD model. This must be reduced to 94 units to keep within the stated 10MW total site power assumption Figure 49. The resulting costs become 94 x $17,500 = $1,645,000 dollars (from $5,390,000). The total number of PODs are reduced from 33 to 9 PODs and the required floor space is therefore 9,150 square feet. Unit Model Description I n Row Cooling Unit ACRCS01 1 CW181D CRAH Unit QTY Price 308 28 $17,500 Traditional In-Row Cooling Data Center ($$$) ($$$) $,390,000 1 Cooling Provided IT Load Supported (Tons) (KW) 6160 1,566,000 $56,000 1 1400 Figure 48: In-Row Cooling Capacity and Costs (Unconstrained > Description Model QTY Unit Price In Row Cooling Unit ACRC501 94 $17,500 CRAH Unit CWI81D 28 $56,000 Traditional Cooling In-Row Cooling Data Center ($$$) ($$$) Provided (Tons) $1,645,000 1180 1,568,000 Total Site Power (MW) PUE 21,677 1.5 32.52 4,927 1.5 7.39 10MW). Total Site IT Load Supported (KW) 6,6161 1400 Figure 49: In-Row Cooling Capacity and Costs (Constrained to 4,927 Power PUE 1.5 1.5 (MW) 1 9.92 7.39 10MW). Observation #9 A cursory look at a data center design searching for an optimum configuration in an attempt to minimize up-front CAPEX expanses was not immediately obvious. This warrants further investigation. Table-3 lists the whole unit capacities for several of the major components on the data center floor. The purpose of this review is to search for rows which exhibit the largest number of elements as close to an integer value as possible. This maximizes the capacities with the lowest parts count possible. Page 80 IRC Pair * * * 5 kw tons kw Load 35.19 704 10 20 70,38 14.08 105.57 21.11 30 140.76 28.15 40 50 175.95 35.19 60 211.14 42.23 246.33 49.27 70 80 281.52 56.30 316.72 63.34 90 351.91 70.38 100 387.10 77.42 110 422.29 84.46 120 457.48 91.50 130 492.67 98.53 140 150 527.86 105.57 563.05 112.61 160 598.24 119.65 170 633.43 126.69 180 668.62 133.72 190 703.81 140.76 200 739.00 147.80 210 774.19 154.84 220 809.38 161.88 230 844.57 168.91 240 879.77 175.95 250 260 914.96 182.99 10 kw 3-52 7.04 10.56 14.08 17.60 21.11 24.63 28.15 31.67 35.19 38.71 42.23 45.75 49.27 52.79 56.30 59.82 63.34 66.86 70.38 73.90 77.42 80.94 84.46 87.98 91.50 15 kw CW114D CW181D 2.35 4.69 * 7.04 9.38 11.73 * 14.08 16.42 18.77 * 21.11 23.46 25.81 28.15 30.50 32.84 * 35.19 37.54 39.88 42.23 44.57 46.92 * 49.27 51.61 53.96 * 56.30 * 58.65 61.00 150kva 200kva 225kva 125kva 0.29 0.18 0.16 0.47 0.35 0.31 0.59 0.70 0.53 0.47 0.88 1.17 0.63 0.70 0.94 1.47 0.78 1.17 0.88 0.94 1.76 1.06 1.41 2.05 1.23 1.09 1.64 2.35 1.25 1.88 1.41 1.41 2.64 1.58 2.11 2.93 1.56 1.76 2.35 3.23 1.94 1.72 2.58 3.52 1.88 2.11 2.82 2.03 3.81 2.29 3.05 4.11 2.19 2.46 3.28 4.40 2.64 2.35 3.52 2.50 4.69 3.75 2.82 2.66 4.99 2.99 3.99 5.28 2.82 4.22 3.17 5.57 2.97 3.34 4.46 5.87 3.52 3.13 4.69 3.28 6.16 3.70 4.93 6.45 3.44 5.16 3.87 6.74 3.60 4.05 5.40 7.04 4.22 3.75 5.63 7.33 3.91 4.40 5.87 6.10 4.57 4.07 7.62 0.23 Table 3: Capacity Matrix Example: Suppose a proposed cooling scenario requires 4.2 physical units of in-row cooling to properly cool an intended IT heat load. The next greatest in-row cooling integer is five units, which does not conform to in-row cooling pairs, thus requires six in-row coolers to maintain the symmetry of a POD. Therefore, six in-row coolers for a 4.2 heat demand are highly inefficient and wasteful of CAPEX expenditures. This line of the table would be discounted as a cost effective solution. The same decision condition exists for deploying IT racks in clusters of four with an in-row cooler between clusters. There will be a line within Table-2 that indicates a low-cost high-yield solution which becomes a design constraint. The intent is to minimize excess capacity, which remains as stranded capacity throughout the life of the data center. Stranded capacity requires resources but will never generate revenue. It is a dead weight loss condition. Page 81 Chapter 8 - Business Case Summary In-Row Cooling (IRC) with containment PODs indicate a significant improvement in cooling efficiencies and IT densities over traditional data center designs. Power densities in excess of 700 watts per square foot rival the performance numbers of data center containers (Figure-50) exhibiting power densities of 1,000 watts per square foot. Figure 50: Data center container 29 exhibiting 1,000 watts sq/ft Power Densities. The on-slab design does not require a raised floor for cold air delivery nor a drop ceiling to capture the hot return air. This has an immediate CAPEX cost savings of $1.3M dollars, not including yearly maintenance costs (OPEX) for below floor and above ceiling cleaning. IRCs and CRAH unit costs are slightly less expensive when normalized to dollars/ton of cooling. Operationally, CFD modeling indicates that the in-row cooling was able to cool a heat load of 18kw per rack across the entire 30,100 data center floor. The CRAH based solution using a traditional data center installation could not exceed 12kw per rack. The in-row POD design operated independently and autonomously from adjacent PODs. No interaction was observed. The traditional data center using PODs could not equal the in-row cooling configuration, as 29 www.hp.com 30 IRC =$17,500 for 20T = $875/ton. CRAH = $56,000 for 51T = $1,100/ton. Page 82 the raised floor did not provide adequate airflow at consistent rates across the POD for even cooling. Server inlet hotspots were evident and Figure-44 shows the variability of airflow delivery through the floor tiles. While the CFD models indicated that a single 30,100 square foot data center floor may not be the optimum size or shape for raised floor cooling, it did highlight the sensitivities of large volume CRAH cooling units and the sensitivities to underfloor airflow delivery. Adaptive floor tiles would be required to regulate the cooling; at a significant increase in raised floor costs. See Figure-5 1. 31 Figure 51: Active floor tiles to regulate airflow . It is paramount that a Data Center Engineer drive the design effort for any data center project. It is not an activity that many do, and even fewer do well. Commercial construction designs alone do not constitute a data center design. All of the independent activities need to be merged into one design that is cost-effective, efficient, and satisfies all of the stated requirements. It also merges the commercial disciplines of Electrical and Mechanical teams while serving the needs of the CIO who drives the selection of IT. Chapter 9 - Conclusions In-Row cooling with Containment PODs exhibits a significant savings on initial Capital Expenses CAPEX and recurring monthly Operational Expenses OPEX over traditional data 31 http://patentimages.storage.googleapis.com Page 83 centers with a raised floor system. Spatial power densities exceeding 700 watts per square foot and rack densities exceeding 15kw represent an innovative approach to air-based cooling of IT equipment. These increased densities require less data center floor space and therefore less physical racks, less power strips, less metering and monitoring, etc., to achieve the same volume of computing. The CAPEX savings from these densities are attractive over less densely populated data center designs. Operationally, the ability to micro-manage the cooling by independently adjusting and de-coupling the in-row coolers offers just-in-time cooling for the demanded heat load. This approach is not available with downflow CRAH based cooling using a common sub-floor plenum for airflow delivery. Although each CRAH unit has variable speed fans and variable chilled water (CW) control valves (as do the in-row coolers), once the downflow air is delivered into the floor plenum, all directivity and control is lost. In fact, variable airspeed delivery into the floor plenum can have a negative impact on subfloor plenum airspeeds, pressures, and directivity. See Figures 31 and 45. The greater the heat load variability and heterogeneity, the greater the savings afforded by the in-row cooling solution. Page 84 Glossary ASHRAE CAPEX CFD CFM Cold-Aisle CRAC CRAH CW CxA Delta-T Hot-Aisle HPC HVAC IT IRC kw kwh MGHPCC NEC O&M OPEX PUE ROM Tier TGG TUI U American Society of Heating, Refrigerating and Air Conditioning Engineers. Capital Expenses. Computational Fluid Dynamics (computer modeling). Cubic Feet Per Minute, airflow volume. A rack configuration of rack rows arranged with a common cold air inlet. Computer Room Air Conditioner (Refrigerant). Computer Room Air Handler (Chilled Water). Chilled Water. Commissioning Agent. AT, Change in Temperature (in 'F or C). A rack configuration of rack rows arranged with a common hot air exhaust. High Performance Computing. Heating Ventilation and Air Conditioning. Information Technology. In-Row Cooling. Kilo Watt of electrical power = 1,000 Watts. Kilo-watt-hour, a measure of utility metering = 1,000W/hr. Massachusetts Green High Performance Computing Center. National Electrical Code. Operations and Maintenance. Operational Expenses. Power Usage Effectiveness (The Green Grid), trademark. Rough Order of Magnitude, estimate. Tier System Rating (Tier-I thru Tier-IV), The Uptime Institute (TUI). Tier-I, Basic Design. Tier-II, Basic+ Design. Tier-III, Concurrently Maintainable Design. Tier-IV, Fault Tolerant Design. The Green Grid, organization. The Uptime Institute, organization. Rack Units representing 1.75" of vertical space within a rack. Page 85