PLAYBOOK F O R C H A N G... Cultivating a Crystal Ball for Data Center Availability and Performance

advertisement
PLAYBOOK
F O R
C H A N G E
Cultivating a Crystal Ball for Data Center
Availability and Performance
Cultivating a Crystal Ball for Data Center Availability and Performance
Once Upon a Time . . .
Turning Fantasy into Reality
Data center managers had a crystal ball that could see into
servers; monitor power and cooling systems; and correct
problems before they occurred. They could see where every
watt of energy was going and how it was being used.
The good news is that this particular fairy tale is quickly
becoming a reality. Technologies are emerging today that
give IT management unprecedented visibility into real-time
operations.
All of that real-time information was mined and analyzed and
integrated so that each person on the management team saw
exactly what they needed to see when they needed to see it.
With that kind of visibility into the present, it is much easier
to plan for the future.
As a result, downtime was a distant memory. Over-provisioning
was unnecessary. Stranded capacity didn’t exist. Workloads
were managed so that equipment utilization rates were always
high, optimizing efficiency and eliminating unnecessary
capital investments. They had a better understanding of costs
and could make smart decisions about when and how to use
virtual and cloud-based resources.
Why You Need a Data
Center Crystal Ball
•Data center managers use four
different software platforms on
average to manage their physical
infrastructure1
This playbook tells you how to cultivate your own crystal ball
for data center decision making. And, while we can’t promise
“happily ever after,” following the suggestions in this playbook
will deliver higher availability, better efficiency and improved
planning.
•The average data center utilizes
only 62% of available rack space1
•The average cost of a data center
downtime event is $505,000 2
•Only 20% of organizations have
mechanisms in place to evaluate
and justify cloud ROI3
With consistently high availability, dynamic capacity and
superior efficiency, the data center management team was
always one step ahead of the business.
•The average data center PUE is
1.8 4; state-of-the-art data centers
have PUEs of 1.3 or less
And, of course, they all lived happily ever after.
1
Emerson Network Power customer insight studies
Emerson Network Power and Ponemon Institute Cost
of Downtime Study
2
Open Group 2012 Cloud Computing Survey
3
Uptime Institute 2012 Data Center Survey
4
1
Cultivating a Crystal Ball for Data Center Availability and Performance
Assessing Current Systems and Practices
Does your data center look the way the design engineers
envisioned it would?
If it’s more than three years old, chances are it doesn’t. Even
new facilities go through an abrupt transition when designers
hand the reins over to operators. From that point forward,
the facility is constantly evolving as new hardware is added
or moved and new applications are deployed.
Assessments perform two valuable functions. First,
they identify vulnerabilities in critical systems that, if left
unaddressed, could lead to downtime. Second, they identify
opportunities to reduce costs and save money.
Data Center Assessment Services
Electrical
Determines if the electrical system is adequate to support
the data center load both now and in the future. Evaluates
the integrity of a facility’s power system and isolates potential
problems and vulnerabilities.
Thermal
Enhances operational performance, exposes vulnerabilities
within the cooling system that could lead to equipment
failure, and ensures cooling systems can handle current and
anticipated loads. Computational Fluid Dynamics (CFD) can
enhance the assessment by documenting airflow patterns
and identifying underfloor obstructions and hot spots.
Efficiency assessment
Evaluates electrical and cooling infrastructure with CFD
modeling to identify opportunities to reduce energy costs.
Infrared scan
Identifies defective components, degraded electrical
connections and other conditions that could result in a
fire or electrical breakdown using a non-invasive method.
Short-circuit and coordination studies
Uncovers inadequately rated or uncoordinated protection
equipment to prevent damage to critical equipment and
eliminate nuisance maintenance trips.
One-line diagram
Creates a one-line diagram for the data center, which serves as
a blueprint for operational, maintenance and testing activities.
Arc Flash study
Analyzes electrical hazards and recommends ways to mitigate
hazards, including equipment labeling, personal protection
equipment and training.
Even the best crystal ball can’t prevent problems if basic
systems are inadequate, poorly configured or not maintained.
2
What Assessments Can
Tell You
•Whether power and cooling
systems have the capacity to
support current and future loads
•Whether data center equipment
is at risk of failure from overheating
•Whether there are flaws in the
electrical system that could lead
to failure
•Where there are opportunities
to improve efficiency
Cultivating a Crystal Ball for Data Center Availability and Performance
Configuring Your Crystal Ball
The data center is a complex ecosystem of interdependent
systems. Configuring a management platform to see across
those systems – and consolidate and analyze data to create
meaningful information – requires a three-tiered approach.
1. Local monitoring
The foundation for the data center crystal ball comes at the
device level with the ability to remotely monitor and access
various data center systems. Through monitoring, data
center personnel have visibility into equipment operating
status and receive real-time alerts and alarms to notify them
of potential problems. It also enables remote access for faster
response to equipment problems.
2. Global planning and aggregation
Pulling in data from devices across the data center creates
the ability to identify dependencies and optimize systems,
such as cooling or power. Aggregated data can also be used
to address key planning questions, such as:
•Is there enough space, power and cooling to meet
future needs?
•How can equipment be commissioned and
decommissioned more efficiently?
•Are systems working in concert to optimize efficiency?
3. Intelligence
At the enterprise level, data is turned into business intelligence
that provides data center personnel with the ability to
proactively address issues before they affect operations,
respond more quickly to changes in the infrastructure, and
make better decisions about future requirements. This “data
center intelligence” can be used to extend the life of the data
center, reduce mean-time-to-repair, synchronize infrastructure
virtualization, minimize capital expenditures and analyze
performance against SLAs.
With each level, the data center monitoring and management
system becomes more valuable by providing a more holistic
view and more meaningful information to IT and facilities
personnel, allowing them to optimize performance while
maintaining or improving availability.
Enterprise
Intelligence
Data Aggregation
Local Monitoring
3
What You’ll Learn from a
Data Center Crystal Ball
•Device-level power consumption
•Real-time facility PUE
•Power, cooling and server utilization
•Alarms related to environmental
conditions and power quality
across the data center
•Data center physical configuration
•Location of stranded capacity
Cultivating a Crystal Ball for Data Center Availability and Performance
Plan Globally, Monitor Locally
A comprehensive approach to data center monitoring
includes IT equipment, power, cooling and space.
Server health monitoring
Service processors provide visibility into a server’s on-board
instrumentation to improve asset productivity, lower operating
costs and speed mean-time-to-repair. They provide insight
into server temperature, CPU status, fan speed and voltages
as well as remote reset or power-cycle capabilities. Service
processors also store event logs related to server hardware
and can be programmed to trigger alerts if operating
thresholds are exceeded.
Temperature monitoring
Uneven temperatures across the data center can damage
equipment and waste energy. Installing a network of
temperature sensors helps ensure that all equipment is operating within the ASHRAE recommended temperature range
(64.4° F to 80.6° F). By sensing temperatures at multiple locations, the airflow and cooling capacity of precision cooling
units can be more precisely controlled, resulting in more efficient
operation. Additionally, cooling costs can be cut by allowing safe
operation closer to the upper end of the temperature range.
Power monitoring
Power monitoring can prevent overloading and help improve
efficiency. Power should be monitored at the Uninterruptible
Power Supply (UPS), the room Power Distribution Unit (PDU)
and within the rack. The best view of IT power consumption
comes from the power distribution units inside racks, which
4
enable continuous monitoring of volts, kilowatts (kW),
amps and kW per hour. In addition to more effective power
management, rack PDUs support more accurate chargeback
of IT services and identify stranded capacity. UPS batteries
should also be monitored, as battery failure is the leading
cause of UPS system loss of power.
Data That Fuels the
Crystal Ball
•Cooling system supply and
return air temperatures
Rack monitoring
Visibility into conditions in the rack can prevent many of
the most common threats to rack-based equipment. A rack
monitoring unit can be configured to trigger alarms when
rack doors are opened, when water or smoke is detected,
or when temperature or humidity thresholds are exceeded.
•Server inlet temperatures
Leak detection
Fluid leaks can cause immediate and permanent damage to
IT equipment. Leak detection systems use strategically located
sensors to detect leaks across the data center and trigger
alarms to prevent damage. Sensors should be positioned at
every point fluids are present in the data center, including
around water and glycol piping, humidifier supply and drain
lines, condensate drains and unit drip pans.
•UPS system status, capacity and
battery health
•Power consumption at the
facility, rack and device level
•Server CPU status and
temperature
•Environmental conditions within
the rack
•Alarms across all systems
Cultivating a Crystal Ball for Data Center Availability and Performance
Aggregating Data Across Systems
System-Level Management
The data center doesn’t just power the digital world, it mirrors
its challenges. There is so much data being produced today
that it can prove difficult to capture and aggregate it all.
There are multiple opportunities to use aggregated data to
get a more holistic view of data center systems. For example,
when temperature-sensor and cooling-unit data from across
the data center are brought together they can be used to
manage all cooling units as one system, improving system
efficiency and performance.
Who Benefits from the
Data Center Crystal Ball?
On a rack- or aisle-basis, environmental data can be integrated
with power data from the rack PDUs to get a more complete
view of what is happening in certain areas of the data center.
•Data center management
Finally, power usage data can be brought together from across
the power system to measure Power Usage Effectiveness (PUE)
or support other efficiency initiatives.
•Data center personnel responsible
for server deployment and
capacity management
Rack-Level Data Collection
A new type of data center appliance has emerged to address
the challenge of collecting the huge volumes of operating
data being generated in any given data center rack. These
appliances consolidate KVM, serial console, rack PDU, embedded
service processor and environmental management in a single
device. This saves rack space and power consumption compared
to deploying multiple devices for alerts, telemetry, environmental sensors, and device access and control; however,
the biggest benefit is the ability to scale data collection and
aggregate the stream of real-time operating data generated
within a particular rack.
De
Leve vice
Con l Power
sum
ption
Environment
ck
a
R
ar
ck Al ms
Ra
er g
r v in
Se erat a
t
Op Da
5
•Facility management
•Data center personnel responsible
for infrastructure systems, such
as power and cooling
Appliance
SCALABLE DATA
AGGREGATION
•IT executive management
responsible for IT strategy and
business alignment
Cultivating a Crystal Ball for Data Center Availability and Performance
Transforming Data into Business Intelligence
When real-time operating data from across the data center
is analyzed, integrated with ITSM application data, and
presented in meaningful ways to facilities and IT management,
real-time data center optimization becomes a reality.
ITSM maps the relationships between applications and the IT
resources that support them, while DCIM does the same for
IT resources and the facility assets that support them. Together,
the two deliver a holistic view of the application support system.
When presented visually to experienced personnel capable of
interpreting that information and projecting it into the future,
the integrated system enables data center personnel to:
•Collaborate, plan and control changes
•Proactively prevent downtime
•Discover and use hidden capacity
•Calculate actual costs to support applications or users
With the emergence of cloud computing, analytics, the mobile
workforce and socially connected markets, businesses will
increasingly demand efficient, highly available and agile data
centers. That is what integrated, holistic data center
management makes possible.
In summary, you don’t need a “crystal ball” just to help you
see the future of your data center; you need it to help you
create the future of your business.
We Are Here
Assess
Cur rent
State
- Electrical
- Ther mal
- Efficiency
MONITOR
Power
Co o ling
Ser ver Health
Environment
MANAGE
Aggregate
Systems
Efficiency
Environment
Alar ms
CONTROL
6
How the Crystal Ball
Drives Efficiency and
Availability
•Increase collaboration across
IT and facilities in planning and
controlling changes
•Proactively manage capacity
based on real-time visibility into
IT and facilities infrastructures
•Identify and rectify data
center issues before they
affect operations
•Increase equipment utilization
and asset productivity
•Eliminate stranded capacity
INTEGRATE
DCIM+
I TSM
!
Realtime
Optimization
•Accurately project the ROI of
cloud-based resources
About Emerson Network Power
Emerson Network Power provides efficient, reliable critical infrastructure
solutions for data centers, communications networks, healthcare and
industrial facilities around the world. With proven innovations in power,
thermal management, IT solutions and a global network of service experts
covering more than 150 countries, we make the future of communications
and information technology possible.
We understand how data center infrastructure is becoming more complex
at almost every level, and more essential to the success of the business than
ever before. Get the insight and resources you need to lead your organization
into the future at EmersonNetworkPower.com/CIOtopics.
EmersonNetworkPower.com
Emerson Network Power and the Emerson Network Power logo are trademarks and service marks of Emerson
Electric Co. All other trademarks are the property of their respective owners. ©2013 Emerson Electric Co.
PB 00002 (04/13)
Download