slides

advertisement
Towards Optimal Sensor Placement
for Hot Server Detection
in Data Centers
Xiaodong Wang, Xiaorui Wang, Guoliang Xing, Jinzhu Chen,
Cheng-Xian Lin. and Yixin Chen.
Outline
Introduction
Related work
Hot server detection problem
CFD-guided sensor placement
Evaluation
Summary
2
Introduction
Thermal monitoring is important in data center
operation:
 Overheating is harmful to data center.
 Malfunction of hardware components.
 Server shut down.
 Excessive cooling energy is consumed.
 Operation of cooling systems is not efficient enough.
 Excessive energy consumption required by overcooling.
To have precise hot server detection:
 Precise hot server detection can guide air conditioning
system.
 Thermal dynamics in data center need to be better studied.
 Place more sensors to increase thermal visibility.
3
Related Work
Studies of thermal profile
 [Choi et al. HPCA ‘07 ] studied thermal profile of a rack.
 [Patel et al. IPACK ‘01] studied the air temperature
specification of a data center in normal condition.
Not used to guide sensor placement.
Improve thermal monitoring with sensor networks:
 [Liang et al. SenSys ‘09] deployed sensor networks in data
center to achieve a high-fidelity thermal monitoring.
 [Moore et al. USENIX ‘05] and [Bash et al. USENIX, ’07]
proposed to allocate server job and workload based on
thermal readings from sensor networks.
How to effectively place sensors?
4
Hot Server Detection Problem
Problem to solve:
 To intelligently place sensors for a maximum hot server
detection probability.
Problem formulation:
 Given M locations to monitor and N (N<M) sensors to use:
1
max
M
Subject to the constraint:
PFi  
P
1i  M
Di
1  i  M
PDi : Detection probability of overheating at monitored location i
PFi : False alarm rate of overheating at monitored location i
5
Problem Solving Architecture
Overheating data center analysis.
 Analyze the data center in overheating condition.
 Obtain the temperature distribution for overheating cases.
Find the sensor placement solution.
 Sensor readings usually are corrupted by noise.
 Sensors need to collaboratively make hot server detection
decision (data fusion)
Overheating Analysis
CFD Modeling
Sensor Placement
Data fusion
& placement algorithm
Temperature
Interpolation
6
Overheating Data Center Analysis
Computational Fluid Dynamics (CFD) model for
overheating data center
 A finite volume method.
Datacenter physical model
Power
consumption
……
temperature
distribution
CFD
Modeling
Temperature
Interpolation
CRAC settings
Example:
A/C in
A/C Out
A/C
Out
7
In/Out
A/C
A/C Out
Overheating Data Center Analysis (cont’d)
Spatial temperature interpolation
 Results from CFD are discrete in locations.
 Granularity of CFD modeling is a tradeoff between accuracy
and computational complexity.
 Inverse Distance Weighting (IDW) interpolation:
 Weighted average of the available temperature data.
Optimize sensor placement based on the
overheating analysis
 To achieve a maximized average overheating server
detection probability.
8
CFD Guided Sensor Placement
Sensor placement with existing solver:
 To decide the x, y, and z variables of each sensor location.
 Constrained Simulated Annealing (CSA)
 An existing solver with 3N variables.
 Computational time increases exponentially.
Lightweight Sensor Placement (LSP):
 Only searches placement solution at areas with clustered
racks.
 A greedy algorithm, which adds sensors one by one.
 Search space and computational time are significantly
reduced.
9
Simulation Setup
Experiment environment setup






CFD software packages: Gambit and Fluent
Server room size: 32m x 7m x 3m
13 racks in the server room.
4 monitored locations each rack (52 locations in total)
14,400 watts power consumption for each overheating rack.
CRAC settings are collected by external sensor.
10
Simulation Results
Different sensor numbers
 Baselines:
 Uniformly Random, current practice.
 CFD+ proportional.
 Using more sensors increases the detection probability.
 CFD+LSP (our solution) is closest to the optimal solution
11
Simulation Results (cont’d)
Different temperature
threshold
 Detection probability decreases
when temperature threshold
increases
Different fusion range:
 A proper fusion range can
increase the detection
probability.
12
Hardware Experiment in a Server Room
Setup:
 A small cluster of two racks
is used.
 Overheating is created by a
heater
Results:
13
Summary
We place sensors intelligently in data centers
 To reach a maximum hot server detection probability
Various overheating conditions are studied to guide
sensor placement
 CFD is used to analyze data centers under overheating
condition.
Future consideration:
 Integrate with thermal control approaches.
 More detail CFD modeling.
14
Q&A
Thank You!
Acknowledgement
 NSF CAREER Award CNS-0845390
 NSF under Grants CNS-0720663, CNS-0915959, CCF1017336, and CNS-0954039
 Microsoft Research under a Power-Aware Computing Award
15
Appendix A
Sensor readings usually are corrupted by noise.
Tm ( xi , yi , zi )  Tr ( xi , yi , zi )  Ni
2
Overheating scenario detected when the measured
temperature is larger than the threshold.


1 n

2
PD  P  Tr xi , yi , zi   Ni   
 n i 1

False alarm happens when the overheating detection
is intrigued by noise only.


1 n

2
PF  P  Ni  C   
 n i 1

16
Appendix B
Rack clustering
 The closest distance of two monitored locations in two
different clusters is larger than 2R.
Inverse Distance Weighting (IDW) interpolation:
n
T (l0 )   iT li 
i 
i 1
17
1 / di

n
p
1 / di
i 1
p
Download