Modeling and Control for Cooling Management of Data Centers with Hot Aisle Containment Rongliang Zhou, Zhikui Wang HP Laboratories HPL-2011-82 Abstract: In traditional raised-floor data center design with hot aisle and cold aisle separation, the cooling efficiency suffers from recirculation resulted from the mixing of cool air provided by theComputer Room Air Conditioning (CRAC) units and the hot exhaust air exiting from the back of the server racks. Hot aisle containment, minimizing recirculation and hence increasing cooling efficiency, has been employed by an increasing number of data centers. Based on the underlying heat transfer principles, we present in this paper a dynamic model for a data center with hot aisles containment and design decentralized model predictive controllers (MPC) for the multiple CRAC units. Each MPC controller adjusts its blower speed and supply air temperature (SAT) to regulate the rack inlet temperatures within its zone of influence below the specified temperature constraints, and at the same time minimizes the cooling power consumption. Approach to partition a data center into overlapping CRAC zones of influence is discussed and the controller design within each individual zone is presented. The proposed decentralized cooling control approach is validated in a production data center with hot aisle contained using plastic strips, and the experiment results demonstrate both its stability and ability to reject various disturbances. External Posting Date: June 21, 2011 [Fulltext] Internal Posting Date: June 21, 2011 [Fulltext] Copyright 2011 Hewlett-Packard Development Company, L.P. Approved for External Publication Proceedings of ASME 2011 International Mechanical Engineering Congress & Exposition IMECE2011 November 11-17, 2011, Denver, Colorado, USA DRAFT:IMECE2011-62506 MODELING AND CONTROL FOR COOLING MANAGEMENT OF DATA CENTERS WITH HOT AISLE CONTAINMENT Rongliang Zhou, Zhikui Wang Sustainable Ecosystems Research Group, HP Labs, Hewlett-Packard Company, 1501 Page Mill Road, Palo Alto, CA 94304-1126. Email: {firstname.lastname}@hp.com ABSTRACT In traditional raised-floor data center design with hot aisle and cold aisle separation, the cooling efficiency suffers from recirculation resulted from the mixing of cool air provided by the Computer Room Air Conditioning (CRAC) units and the hot exhaust air exiting from the back of the server racks. Hot aisle containment, minimizing recirculation and hence increasing cooling efficiency, has been employed by an increasing number of data centers. Based on the underlying heat transfer principles, we present in this paper a dynamic model for a data center with hot aisles containment and design decentralized model predictive controllers (MPC) for the multiple CRAC units. Each MPC controller adjusts its blower speed and supply air temperature (SAT) to regulate the rack inlet temperatures within its zone of influence below the specified temperature constraints, and at the same time minimizes the cooling power consumption. Approach to partition a data center into overlapping CRAC zones of influence is discussed and the controller design within each individual zone is presented. The proposed decentralized cooling control approach is validated in a production data center with hot aisle contained using plastic strips, and the experiment results demonstrate both its stability and ability to reject various disturbances. highly efficient cooling systems are indispensable to reduce the total cost of ownership (TCO) and environmental footprint of data centers. Figure 1 shows a typical raised-floor air-cooled data center with hot aisles and cold aisles separated by rows of IT equipment racks. The thermal requirements of IT equipment are usually specified in terms of the inlet air temperatures of the equipment [3]. The blowers of the Computer Room Air Conditioner (CRAC) units pressurize the under-floor plenum with cool air, which in turn is drawn through the vent tiles located in front of the racks in the cold aisles. Hot air carrying the waste heat from the IT equipment is rejected into the hot aisles. In most cases, the internal CRAC control can regulate the chilled water valve opening to track the given reference of Supplied Air Temperature (SAT). The flow rate of the supplied air can also be tuned continuously if a Variable Frequency Drive (VFD) is available for each CRAC unit to vary the speed of its blowers. In an open environment without either hot aisle or cold aisle containment, air streams are free to mix. Most of the hot air in the hot aisles returns to the CRAC units, but a small portion might escape into the cold aisles from the top or the sides of racks and causes recirculation. The inlet air flow of the IT equipment is thus a mixture of cool air from the vent tiles in its vicinity and the re-circulated hot air [4]. While recirculation locations are usually at the top, bottom, or sides of the rows of IT equipment racks where hot air can escape into the cold aisle more easily, recirculation can also occur when some IT equipments (some network switches, for example) have to installed in the hosting rack such that they face the hot aisle and internal fans blow the hot exhaust air into the cold aisle. 1 INTRODUCTION Due to the ever-increasing power density of the IT equipment, today’s data centers consume tremendous amount of power. According to [1, 2], about 1/3 to 1/2 of data center total power consumption goes to the cooling system, and hence 1 c 2011 by ASME Copyright FIGURE 1. outside the region. The MPC controller of each partitioned region coordinates the CRAC units blower speed and supply air temperature (SAT) to regulate the rack inlet temperatures within the partition below the specified temperature constraints, and at the same time minimizes the cooling power consumption. The decentralized controller structure lowers the risk of controller failure and is scalable to very large scale data centers. The other sections of this paper are organized as follows. Section 2 derives the dynamic cooling models using the energy and mass balance principles. In Section 3, we discuss the approach to partition a data center for modeling and control, and the detailed controller design for each partition. Section 4 focuses on the controller implementation and experimental results. The paper is summarized in Section 5 with discussion on the future work. TYPICAL RAISED FLOOR DATA CENTER The recirculation of hot air in the cold aisle generates entropy and lowers the data center cooling efficiency [4]. In order to reduce the mixing of hot and cold air streams and hence improve the cooling efficiency, various modifications to the original hot and cold aisle separation proposed in [5] scheme have been proposed. Most the improvement efforts are centered around building a mass and heat transfer boundary between the hot and cold air streams and eliminating the recirculations. Variations of the alternating cold and hot aisles configuration such as cold aisle containment, in row cooling with hot aisle containment, overhead cooling, and ducted hot air return path are studied through simulation in [6]. In experimental study, Oracle demonstrates 40% CRAC units blower power savings resulted from the ducted hot air return path between the IT equipment and the CRAC units in a production data center [7]. To the author’s knowledge, while hot aisle containment has been adopted some data centers, its modeling, simulation, and experimental explorations have been scarce in the literature. While improvements in the cooling infrastructure can help enhance the cooling efficiency, its effectiveness is limited if the CRAC units are controlled by the specified return temperature and the data center cooling is overprovisioned. In order to maximize the cooling power savings, real-time thermal status monitoring of the IT equipment and cooling actuation with fine time and space granularities are essential. Real-time sensing capability, such as the extensive temperature sensor network introduced in [8], ensures timely response to thermal anomalies of the IT equipment. Coordinated tuning of the CRAC units blower speeds and SAT, on the hand, seeks to minimize power consumption from both CRAC units and chiller plants. In this paper, we develop the dynamic model for a data center with hot aisles containment and design decentralized model predictive controllers (MPC) for the multiple CRAC units. Based on the influences of CRAC units on the rack inlet temperatures, a data centers are partitioned different regions, each containing certain number of CRAC units and IT equipment. The CRAC units of each zone can significant affect the inlet temperatures of the IT equipment within that region, but not the IT equipment 2 DYNAMIC COOLING SYSTEM MODELING In this section, we derive simplified models from the basic mass and energy balance principles to characterize the complex mass and energy flows within the raised-floor air-cooled data centers. 2.1 Cool and Recirculated Hot Air Mixing at Rack Inlet In the open environment, air flow coming into the IT equipment inlet is a mixture of the cool air from the CRAC units (through the vent tiles) and the recirculated hot (exhaust) air that escapes into the cold aisle. In hot aisle contained environment, although significantly reduced, recirculation could still exist because of the reverse flow from some network switches. The internal fans of these network switches draw hot air from the cold aisle for cooling, reject the even hotter air into the cold aisle, and cause the inlet temperatures of neighboring IT equipment to rise. For generality, we choose to include the effects of recirculation in the modeling, and we can easily set the corresponding item to zero for perfectly contained environment without reverse flow. FIGURE 2. AIR MIXING AT THE RACK INLET To determine the effects of both cool air and recirculated hot 2 c 2011 by ASME Copyright air on the rack inlet temperature, consider a small control volume in the proximity of the rack inlet with mass m and temperature T , as shown in Fig. 2. Cool and recirculated hot air flows with mass and temperature (mc , Tc ) and (mr , Tr ) enter the control volume, mix well with the air (m, T ) already in the volume, leave the control volume altogether and enter the rack inlet with total mass m∗ and temperature T ∗ . Based on mass balance principle, m∗ = m + mc + mr , The cool air flow, after leaving the CRAC unit blowers and traveling through the under-floor plenum, is distributed through the vent tiles. Each adaptive vent tile can be treated as an adjustable valve. In order to determine the total cool air flowing through a vent tile, we define the normalized tile opening U tile e as: U tile = Utile / ∑ Utile , e Ntile (1) in which Utile is the vent tile mechanical opening ranging from 0 to 100 percent and Ntile is the number of tiles in a cold aisle. For a cold aisle cooled exclusively by a single CRAC unit, if it is assumed that the cool air mass flow rate ṁtile through each individual tile is proportional to its normalized opening U tile , then e we have: and from energy balance principle, m∗ h∗ = mh + mc hc + mr hr . (2) Within the typical data center operation temperature range, air can be approximated as an ideal gas with dh = c p dT , and the constant-pressure specific heat capacity c p can be assumed to be constant. Combining Eqn.(1) and (2), it can be found that the temperature change ∆T of the air within the control volume before and after the mixing is: ∆T , T ∗ − T = mr (Tr − T ) mc (Tc − T ) + . m + mc + mr m + mc + mr ṁtile = ṁCRAC ·U tile . e It can be shown that the cool air flow distribution under this assumption is consistent with the mass balance principle, since ṁCRAC = (3) ∑ ṁtile . (6) Ntile The cool air flows leaving the vent tiles are free to mix above the floor. As a result, the cool air flow ṁc that reaches a rack inlet might come from several vent tiles in its vicinity: Equation (3) reveals that the influence of cool and recirculated hot air on rack inlet temperature can be mainly captured by mc (Tc − T ) and mr (Tr − T ), respectively. This seemingly very simple insight is consistent with the physical intuition and also provides guidance to unite the CRAC unit SAT and VFD control as we will see later. ṁc = ∑ btile · ṁtile , (7) Ntile v in which Ntile v stands for the number of vent tiles nearby that have significant influence over the cool air flowing into the rack, and the contribution of each vent tile is quantified by btile . Equations (4), (5) and (7) together give 2.2 Cool Air Flow From CRAC Unit to Rack Inlets While the temperature of the recirculated hot air is beyond direct control, the cool air delivered to the rack inlets can be adjusted through tuning of SAT and VFD of the CRAC units. In raised-floor data centers, the pressure difference below and above the floor drives the cool air flow toward above the floor through the vent tiles. Assuming that the air density change is negligible for normal CRAC units operation, the total cool air flow ṁCRAC delivered by the blower of each CRAC unit can be determined by the fan law: ṁCRAC = kCRAC ·V FD, (5) ṁc = kCRAC ·V FD ∑ Ntile v btile ·U tile , e (8) which describes how the rack inlet cool air flow is affected by one specific CRAC unit and the vent tiles near the rack. For multiple CRAC units deployment, we can sum up the total cool air flows into a rack inlet from all the CRAC units as following, (4) ṁc = { in which V FD stands for the speed of the blower in the percentage of the maximum VFD setting. The coefficient kCRAC may vary with each CRAC unit and can be either provided by the manufacturer or determined through experiments. ∑ NCRAC kCRAC ·V FD} · { ∑ Ntile v btile ·U tile }, e (9) in which NCRAC is the number of CRAC units that significantly affect the cool air flow reaching the rack. 3 c 2011 by ASME Copyright 2.3 Control Oriented Rack Inlet Temperature Model In this section, we extend the models to capture the effects of SAT tuning on the rack inlet temperatures. Substitute Eqn. (9) into Eqn. (3) and replace T ∗ and T in Eqn. (3) with rack inlet temperatures at time steps k + 1 and k respectively, we have the following discrete model for rack inlet temperatures: 3.1 Control System Structure The multiple CRAC units within a data center, together with hundreds or even thousands of rack inlet temperatures to be regulated, compose a complex large scale system. Cooling control of this complex system using a centralized controller is computationally extensive, and the robustness and reliability are also serious concerns. If the central controller fails, the cooling system is left with the last settings it receives from the controller and entire data center is at risk. With the observation that each CRAC unit within a data center has its own zone of influence, and only significantly affects of the inlet temperatures of the nearby racks, decentralized control can be utilized to replace commonly used centralized controller design. After associating each CRAC unit with the rack inlet temperatures on which it has significant effect, the entire data centers can be partitioned into a number of zones. Each zone contains one CRAC unit and a number of rack inlet temperatures that the CRAC unit has control over. In traditional decentralized control for large scale industrial systems, the subsystems do not share inputs or outputs [9]. Inlet temperature of a specific rack with a data center, however, could be significantly affected by more than one CRAC units. Naturally, the different zones of a data center after partition can have overlapping temperature outputs, but not overlapping control inputs if each subsystem contain only one CRAC unit. NCRAC T (k + 1) = T (k) + { gi · [SATi (k) − T (k)] · ∑ i=1 Ntile V FDi (k)} · { ∑ b j ·U j (k)} +C, e j=1 (10) in which gi quantifies the influence of CRAC unit i and C represents the rack inlet temperature increase brought by recirculation. Notice that in Eqn. (10), b j 6= 0 (1 ≤ j ≤ Ntile ) only when vent tile j is in the vicinity of the rack. For rack inlet temperature Ti in a data centers with fixed vent tile openings, let Ntile Gi, j = g j ∑ b j ·Ue j (k), 1 ≤ i ≤ NT , 1 ≤ j ≤ NCRAC j=1 the vector form of Eqn. (10) for multiple rack inlet temperatures is: T (k + 1) = T (k) + F +C, (11) in which T = [T1 , T2 , · · · , TNT ]T , F = [F1 , F2 , · · · , FNT ]T , NCRAC Fi = ∑ Gi, j [SAT j (k) − Ti (k)]V FD j (k), j=1 1 ≤ i ≤ NT , 1 ≤ j ≤ NCRAC C = [C1 ,C2 , · · · ,CNT ]T . FIGURE 3. TURE 3 DECENTRALIZED CONTROLLER DESIGN Data center cooling is in essence an optimal control problem, in which the total cooling power is minimized in response to the dynamic IT workload while the rack inlet temperatures are maintained at or below the specified thresholds. The temperature thresholds are not necessarily uniform across the entire data center but are dependent on the different functions, such as computing, storage, and networking, that the IT equipment serves. Service contracts of the IT workload hosted in the IT equipment also affect the temperature threshold. DECENTRALIZED CONTROL SYSTEM STRUC- Figure 3 illustrates the proposed decentralized control system structure for a data center with three CRAC units, in which T i , 1 ≤ i ≤ 3 are vectors containing rack inlet temperatures. Note that each decentralized controller regulates the blower speed V FD and supply air temperature SAT of a specific zone, and neighboring zones could have overlapping temperatures. By 4 c 2011 by ASME Copyright sharing rack inlet temperatures between neighboring zones, the decentralized cooling system has better robustness. In the event of a controller and CRAC failure, the temperature rise of that specific zone will be detected by the adjacent zones with inlet temperature sharing, and these neighboring zones will respond with increased cooling resources provisioning to maintain the shared inlet temperatures below the specified thresholds. In the constrained optimization above, ∆V FD(k + i) = V FD(k + i) −V FD(k + i − 1), ∆SAT (k + i) = SAT (k + i) − SAT (k + i − 1). V FD and SAT remain constant from time step hu − 1 to time step hp − 1. The hu and hp are the control horizon and prediction horizon respectively, with hu ≤ hp. The objective function J penalizes both the total cooling power consumption and the rate of change of cooling actuation. The CRAC units blower power increases along with V FD3 according to the fan laws, and it is also assumed that the chiller power consumption increases linearly as the CRAC unit SAT decreases. RV FD and RSAT are appropriate weights on the blower power of the CRAC units and the thermodynamic work of the chiller plant. The rate of change of control actions is penalized for the purpose of system stability. Among the optimization constraints, T re f is the rack inlet temperature threshold. Cooling control inputs including the blower speeds V FD and supply air temperatures SAT are constrained by their respective physical or specification limitations. It is found through experiments, for example, that in most cases it is not desirable to turn a CRAC unit off even if its load is very low since doing so will significantly change the air flows within the data center while resulting in insignificant power savings. 3.2 MPC Controller Design within Each Zone Figure 4 shows the MPC controller structure for each zone within the partitioned data center, and the two cooling knobs available to the controller are the CRAC unit SAT and VFD. The effects of these cooling actuators on the rack inlet temperatures are captured by the models in the previous section. The objective function of the MPC controller is set up to reflect the total power usage of the cooling system. By comparing the rack inlet temperature measurements T with the temperature threshold T re f , the MPC controller automatically seeks the optimal CRAC unit settings in response to the dynamic IT workload. The cooling resources provisioning and transport are coordinated since they are considered simultaneously in the same framework to minimize the cooling power. 4 FIGURE 4. EXPERIMENTAL SETUP AND RESULTS The proposed decentralized controller was implemented and evaluated through experiments in a production data center. The hot aisles of this data centers are contained by plastic strips hanging from the ceiling. We present part of the experimental results in this section. MPC CONTROLLER WITHIN EACH ZONE The MPC controller is formulated as a constrained minimization problem: 4.1 TestBed Figure 5 shows the experimental data center with 10 rows of racks and 8 CRAC units. The red dotted lines delineate the hot aisles contained using plastic strips hanging from the ceiling. Within the data center walls, row F and G are separated from row Aext, Bext, and Cext by a wall above the floor and dampers along the wall in the underfloor plenum. Due to the ongoing IT equipment upgrade, row A currently does not host any racks. All of the racks hosted are fully instrumented with 5 temperature sensors in the front and another 5 on the back at different heights. Temperature sensors for some racks might not function properly because of the current IT infrastructure upgrade. Row C, for example, although hosts 6 racks, does not have any functioning temperature sensors. Table 1 lists for each row the number of racks it hosts and the number of properly functioning inlet temperature sensors. In this paper, we only consider the total 220 rack inlet hu−1 J(V FD, SAT ) = 3 ∑ {V FD (k + i)}RV FD + {−SAT (k + i)}RSAT i=1 + |∆V FD(k + i)|WV FD + |∆SAT (k + i)|WSAT subject to: V FDmin ≤ V FD(k + i) ≤ V FDmax , SATmin ≤ SAT (k + i) ≤ SATmax , T (k + j + 1) ≤ T re f , for all 0 ≤ i ≤ hu − 1 and 0 ≤ j ≤ hp − 1. 5 c 2011 by ASME Copyright TABLE 1. NUMBER OF RACKS AND INLET TEMPERATURE SENSORS FOR EACH ROW temperature sensors. The five inlet temperature sensor for each rack are named T 1 through T 5 from the rack bottom up to the top. G9.T5, for example, refers to the inlet temperature at the top of rack G9. Row of Racks PDU3 PDU4 BL465cG 5 46 BL465c 47 Tier2 Cisco 6509 XC++ BL460c 52 51 50 49 48 BL{860,68 TiColi TiColi 0,685,460} DL145G3 DL145G3 BL460-qc c F1 8 Tier2 Cisco 6509 5 9 EVA DL320G4 6 0 0 Aetx 10 35 B 7 25 Betx 9 5 C 6 0 Cetx 10 20 D 6 15 E 7 35 F 8 40 G 9 45 CRAC5 G1 LSNP LSNP Clients DL785G5 2 1 rx6600 Neo4p 3 Mgmt 4 EVA Empty dl320g4 EVA 12 11 10 7 CRAC2 A CRAC6 CRAC4 E1 D1 # of Inlet Temperature Sensors E7 EVA C1 D6 14 13 63 USI Network TiColi dl365 Power Scaling Power Scaling Power Scaling Power Scaling Power Scaling EVA Tycoon Ext EVA 71 70 69 68 67 66 65 64 BL25p EricA Cluster BL25p rp2450 17 16 15 rp5470 30 29 rp7400 DL580G4 DL580G4 Misc XC SAN Core SAN Core DL3*G5is DL380G4 MDS95xx MDS95xx h DL385 18 DL380G5 DL385 Service Core XP1024 DKU dl580g2 Misc BL25p-dc 19 dl580g2 28 27 26 25 24 23 22 Bl25p-dc C6 Service Core XP1024 DKC dl580g2 XC DL380G4 44 43 42 41 40 39 38 37 36 20 dl360g4 B1 G9 F8 CEXT1 Service Core XP1024 DKU 35 34 33 32 31 dl360g4 B7 Softudc, Xen, EricA 5BLp BEXT1 BL460c-dc USI Pool Tycoon Int SE3D devl FAB USI Pool FAB USI Pool Thermal USI Storage DMP 76 75 74 73 72 CRAC1 USI Devl Heater A1 SFS Heater A7 CEXT10 BEXT9 61 60 59 58 57 56 55 54 53 Heater AEXT1 62 AEXT10 DPOA CRAC8 USI NetApp Empty 81 80 79 78 77 CRAC7 # of Racks CRAC3 For inlet temperature Ti , define FIGURE 5. LAYOUT OF THE EXPERIMENTAL DATA CENTER TCIi,max = max(TCIi, j ), 1 ≤ j ≤ 8, then Ti is assigned to the jth CRAC unit if TCIi, j ≥ λ TCIi,max , 4.2 Zone Partition for Decentralized Control For decentralized cooling control design, entire data center needs to be partitioned into a number of zones. Each zone contains both cooling actuator (CRAC units) and a number of rack inlet temperatures to be maintained at or below the specified thresholds. Targeting a complete decentralized deign, we have chosen to allocate only one CRAC unit to each zone. The experimental data center is thus divided into 8 zones, with the ith CRAC unit in the ith zone. In order to limit the interactions between individual zones and enhance stability, it is essential to group the inlet temperatures with the CRAC units that most effectively influence them. The influence of a CRAC unit on a specific inlet temperature can be captured by the thermal correlation index (TCI) value. As defined in Eqn.12, TCIi, j quantifies the response of the ith rack inlet temperature to a step change in the SAT of the jth CRAC unit [10]. All the TCI values [TCIi, j ], 1 ≤ i ≤ 220, 1 ≤ j ≤ 8 constitute a static gain array from the system inputs to the outputs. TCIi, j = ∆Ti . ∆SATcrac, j in which λ is a threshold to adjust the overlapping of inlet temperatures between individual zones. Choosing λ = 1, for example, results in disjoint partition of rack inlet temperatures among the 8 zones. Table 2 lists the number of inlet temperatures of each zone when λ = 0.5. This partition is validated using the relative gain array (RGA) analysis developed in [11] for non-square systems and found to be reasonable. The decentralized control experiments to be presented later are based on this partition. 4.3 Implementation of Decentralized Controller Following the model structure of Eqn. (11), a multipleinput-multiple-output (MIMO) model was identified through system identification experiments for each of the 8 partitioned zones. The inputs of each MIMO model are VFD and SAT of the CRAC unit in the zone, and the outputs are the rack inlet temperatures within the zone. For initial development, the decentralized MPC controllers were implemented in Matlab running on a Windows server. Each (12) 6 c 2011 by ASME Copyright TABLE 2. NUMBER OF INLET TEMPERATURE SENSORS FOR EACH ZONE Zone 1 2 3 4 5 6 7 8 # of inlet temperatures 8 19 26 69 72 59 35 50 of the 8 MPC controllers monitors the thermal status within its zone and adjusts the CRAC unit VFD and SAT accordingly. Although currently running on the same machine, the decentralized MPC controllers can easily run different servers. The Matlab optimization toolbox [12] function “fmincon” was used to solve the constrained optimization problem. In the objective function J, RV FD and RSAT were chosen to reflect the actual blower power and chiller power consumption, and the weights WV FD , WSAT , and Wtile were chosen carefully to ensure satisfactory transient performance. The control interval was set to 25 seconds to allow for sufficient time for computation. The control horizon hu and prediction hp are 1 and 5, respectively. Constraint relaxation method [13] was applied to address the feasibility problem due to the temperature constraints. In the case of temperature threshold violations, the temperature constraints in the prediction horizon were relaxed to approach the thresholds asymptotically. It was observed in the experiments that the temperature constraints relaxation helped avoid aggressive actuation as well. of CRAC units 5 and 6 are lowered, and the VFDs are increased, as shown in Fig. 6(a) and 6(b). Compared with zone 4 and 5, the lowered Tre f for row F only increases Tvio max from -0.3◦C to 0.2◦C for zone 4, since the inlet temperatures in row F that also belongs to zone 4 are at least 1.8◦C below Tre f before its reduction. The minor temperature violation of zone 4 leads to slight decrease of its SAT and increase of VFD. It is also shown in Fig. 6(a) and 6(b) that the effects of Tre f change in row F are insg=ignificant for zone 3, which does not monitor any inlet temperatures in row F. Similar explanations apply when shortly after two hours Tre f for row F is changed back to what it was before the experiment starts, which causes SATs of CRAC units 5 and 6 to increase and the VFD to decrease. The cooling system reaches steady state after some initial transients. Zone 1, 2, 7, and 8 are not affted by the Tre f change in row F, and their CRAC unit settings are hence not shown here. (a) CRAC UNITS SAT FIGURE 6. Experimental Results–Step Change of Rack Inlet Reference Temperatures Within each zone, the decentralized MPC controller aims to minimize the cooling power required to maintain all the rack inlet temperatures at or below the specified thresholds. The experimental evaluation of the decentralized controller starts with step changes in the reference temperature T re f of row F. Among the 8 zones, zone 4, 5 and 6 have 9, 36 and 25 temperatures sensors in row F, respectively. The other zones do not have any inlet temperatures in row F. Figures 6(a) and 6(b) show the SAT and VFD trajectories of CRAC units 3, 4 ,5 and 6, respectively. Prior to the experiment, the data center has reached a steady thermal state, and the maximum rack inlet temperature violation Tvio max = max(T − Tre f ) for row F is -0.9◦C, occurring at the bottom inlet temperature of rack F8. The step change in Tre f increases maximum Tvio max from -0.9◦C to 1.1◦C for both zone 5 and 6, since both zones contain bottom temperature of rack F8. The decentralized MPC controllers in these two zones respond to the Tvio max excursion by adjusting their respective CRAC unit to increase the cooling resource provisioning. As a result, the SATs (b) CRAC UNITS VFD SAT AND VFD OF CRAC UNITS 3-6 4.4 4.5 Experimental Results–Step Change of Vent Tile Opening In order to evaluate the ability of the decentralized controller to reject external disturbances, the vent tile in front of rack G9 is manually changed. The inlet temperatures of rack G9 belong to both zone 5 and 6, but not other zones. Notice that rack G9 is also at the end of the row, and some exhaust air from the hot aisle can escape the plastic strips containment and cause recirculation. About one hour after the experiment started, the vent tile in front of G9 is increased from 50% to 100%. Prior to this step change in tile opening, the cooling system has reached a steady state and inlet temperature G9.T5 has the maximum temperature violation for both zone 5 and 6. With the larger tile opening, more cool air is directed to rack G9, which suppresses the recirculation and lowers the rack inlet temperature. Figure 8(a) shows the inlet temperature G9.T5, which is subject to more recirculation than the other inlet temperatures, drops more than 3◦C because of the completely opened tile in front of the rack. The quick decrease in G9.T5 drives Tvio max from around -0.2◦C 7 c 2011 by ASME Copyright (a) CRAC UNITS SAT FIGURE 7. the risk of failure suffered by centralized controllers, and can be more easily scale up to large-scale data centers. The experimental results in a production data center with hot aisle containment validated the ability of the decentralized controller to handle reference temperature change of rack inlet temperature and disturbance rejection. To extend the work presented, the authors are working to compare the performance of the decentralized controller with centralized controller, and add adaptive vent tiles into the existing framework. (b) CRAC UNITS VFD SAT AND VFD OF CRAC UNITS 3-6 REFERENCES [1] Greenberg, S., Mills, E., Tschudi, B., Rumsey, P., and Myatt, B. “Best practices for data centers: Results from benchmarking 22 data centers”. In 2006 ACEEE Summer Study on Energy Efficiency in Buildings. [2] Patel, C. D., Bash, C. E., Sharma, R., Beitelmam, M., and Friedrich, R.J. “Smart cooling of data centers”. In IPACK03, The Pacific Rim/ASME International Electronic Packaging Technical Conference and Exhibitions. [3] ASHRAE, 2005. “Datacom equipment power trends and cooling applications”. [4] Cullen E. Bash, Chandrakant D. Patel, Ratnesh K. Sharma, Apr 2003. Efficient thermal management of data centers – Immediate and Long-term research Needs, Vol. 9, no. 2. HVAC&R Research. [5] Sullivan, R., 2000. “Alternating cold and hot aisles provides more reliable cooling for server farms”. Uptime Institute. [6] Wu, K., 2008. “A comparative study of various high density data center cooling technologies”. [7] Martin, M., Khattar, M., and Germagian, M., 2007. “Highdensity heat containment”. ASHRAE Journal, 49(12), pp. 38–43. [8] Bash, C. E., Patel, C. D., Sharma, R., 2006. “Dynamic thermal management of air-cooled data centers”. In The 10th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm ’06). [9] Scattolini, R., 2009. “Architectures for distributed and hierarchical model predictive control-a review”. Journal of Process Control, 19(5), pp. 723–731. [10] Bash, C., Patel, C., and Sharma, R., 2006. “Dynamic thermal management of air cooled data centers”. In Thermal and Thermomechanical Phenomena in Electronics Systems, 2006. ITHERM’06. The Tenth Intersociety Conference on, IEEE, pp. 8–pp. [11] Chang, J., and Yu, C., 1990. “The relative gain for nonsquare multivariable systems”. Chemical engineering science, 45(5), pp. 1309–1323. [12] Coleman, T. F., and Zhang, Y., 2003. “Optimization Toolbox For Use with MATLAB: User’s Guide Version 2”. [13] de Oliveira, N., and Biegler, L., 1994. “Constraint hand- to -1◦C, and the CRAC units of zone 5 and 6 responds with increased SAT and lowered VFD to save cooling power, as shown in Fig.7(a) and 7(b). Another important fact worth mentioning is that as G9.T5 deceases to around 27◦C, one ◦C below its reference temperature, it is no longer the inlet temperature with the highest temperature violation for zone 5 and 6. Inlet temperature G3.T3 takes over the place of G9.T5 as the one with maximum violation for both zone 5 and 6 during the transient process after the tile change. After the cooling system reaches the new steady state, inlet temperature with the maximum violation eventually alternates between G3.T3 and F5.T5, as shown in Fig.8(b). The reference temperature for both G3.T3 and F5.T5 is 25 ◦C. (a) CRAC UNITS SAT FIGURE 8. (b) CRAC UNITS VFD SAT AND VFD OF CRAC UNITS 3-6 Figures 7(a) and 7(b) show that the tile opening change does not have significant influence beyond zone 5 and 6. As the result, the SAT and VFD of CRAC unit 3 and 4 remain constant during most of the time of the experiment. The local effect of vent tile opening adjustment in the cold aisle, however, is desirable if adaptive vent tiles are to be incorporated into the framework presented in this paper. 5 CONCLUSIONS AND FUTURE WORK In this paper, we derived a dynamic model for data center cooling management and proposed a decentralized MPC controller design. The decentralized control system structure lowers 8 c 2011 by ASME Copyright ing and stability properties of model-predictive control”. AIChE Journal, 40(7), pp. 1138–1155. [14] Zhikui Wang, Cullen Bash, Christopher Hoover, Alan McReynolds, Carlos Felix, Rocky Shih, 2010. “Integrated management of cooling resources in air-cooled data centers”. In 6th annual IEEE Conference on Automation Science and Engineering. [15] Monem H. Beitelmal and Cullen E. Bash, March 15, 2007. Optimizing centralized chiller system performance for dynamic smart cooling. Technical Report No. HPL-2007-41, Hewlett Packard Laboratories. 9 c 2011 by ASME Copyright