Dr. Natheer Khasawneh. Sara Ismail.

advertisement
Dr. Natheer Khasawneh.
Sara Ismail.


The chapter specifies what Data Center-related information
should be documented and maintained and how such data is
helpful for managing rooms, troubleshooting during
emergencies, and planning future Data Center expansions.
The chapter also suggests inexpensive tools that can be used
to monitor a server environment and recognize problems
before they affect the systems contained within.





To help simplify your management of these rooms, document as
much information about them as possible. The more details you
collect and maintain about a Data Center, the fewer mysteries that
can arise and trigger unanticipated problems or delays.
Cabinet locations, electrical and data infrastructure, server names,
and installed applications are all key details worthy of keeping track
of.
There are several choices for how to archive this data. One option is
a maintained Data Center handbook, filled with reference materials
pertaining to the room. Even more effective is the information
posted on a company intranet site.
Whenever alterations are made to your server environment, have
those changes reflected in the documentation for the room.
Data Center map must be kept current at all times, For Data Center
details that change frequently, update information on a regular
basis, such as monthly or quarterly.
Floor Plan: One of the more
powerful documents to have
for a Data Center is a map of
the room. At a minimum,
an accurate map shows physical
clearances, cabinet locations,
the placement of major
infrastructure, and the Data
Center's numbering scheme.
This information is helpful
when allocating space for
incoming servers and crucial
when the time comes to expand
the server environment.






As-Built: As part of the design package issued for the construction
of your Data Center, require the respective cabling and electrical
contractors to provide as-built blueprints of the room.
An as-built is just what it sounds like—a document showing specific
Data Center infrastructure as it was built.
A cabling as-built shows the physical paths of all structured cabling
and provides termination details.
An electrical as-built shows the equivalent information for electrical
infrastructure—conduit paths, how many and what types of
receptacles, and which circuits specifically terminate where.
Many changes, big and small, often happen during the construction
of a server environment. As-built documents incorporate all of these
and show how a room truly is.






Server Inventory:
Once a Data Center is operational, inventory its servers, networking
devices, and other equipment on a regular basis. Include the name,
make and model of machine, and corresponding cabinet location in
the room. Follow the same Data Center numbering scheme that you
use for cable runs and electrical schedules.
inventorying Data Center equipment keeps you in touch with what
items are flowing in and out of the room over time. This can help
you identify equipment trends, alerting you to changes that need to
occur to your existing infrastructure.
Consider recording additional physical details about your Data
Center equipment as well.
Inventorying servers might even save your company money.
Store inventory information in an online database.






Applications:
Other valuable data to inventory are the applications running on
each server within the Data Center. This information is useful for
two reasons:
if you are going to perform work on a machine that hosts a
particular application, in your change request you can accurately
define all servers that are going to be affected by the scheduled
downtime.
if an application fails unexpectedly you can quickly determine the
scope of the problem and what specific servers are affected.
They typically span multiple machines, it is frequently impossible to
isolate applications to a particular section of the room.
Be aware that application information can be more difficult to obtain
and keep current than a physical inventory of servers. That's
because applications are added to, upgraded on, or removed from
machines more frequently than devices are physically relocated.






Processes: Useful processes to document include:
Access and change management policies— Instructions for how to gain
access to the Data Center .
Service level agreements (SLAs)— Involving Data Center-related clients,
support organizations, and vendors. An SLA is a contract between
someone who is hired to perform a task or service and a customer,
specifying the measurable functions and services they are to provide.
Server installation guidelines— Spell out for Data Center users how they
can most effectively install their incoming equipment.
Equipment move procedures— If your business is prone to relocating
servers from one Data Center to another, perhaps due to acquiring
another company, it is helpful to have some basic instructions on hand.
Features and Philosophies: Last, consider documenting and publishing
details about your Data Center's infrastructure as well as the design
philosophies behind it.







For that real-time information, you need tools that actively monitor the
room. The greater the ability you have to "see" in to your Data Center
without having to physically be there, the easier it is to manage.
Web cameras:
A great way to tell what's happening in your Data Center is to deploy
web cameras that leverage the room's network.
For the small expense of one or two web cameras per Data Center, you
can instantly see the condition of the room and know the status of its
most vital infrastructure systems, all from any computer connected to
your company's internal network.
Amperage Meters: An additional method of keeping an eye on your Data
Center is having your server cabinet power strips equipped with
amperage meters.
These devices display the amount of electrical load that is put upon
them.
This tells a Data Center user how close they are to reaching the
maximum electrical capacity of a power strip.







It also helps with efforts to balance power within a server cabinet.
If someone is installing a server with a single power feed, they can check
which of a server cabinet's two power strips is carrying the lesser
electrical load and plug in to that one.
Temperature Sensors: useful thing to know about your Data Center is
how hot or cold it is.
Monitoring the temperature of the room can alert you to a
malfunctioning air handler, air flow problems, or hot spots that are
forming due to increased server density at a particular cabinet location.
Many servers and networking devices also enable you to check their
internal temperature by entering a certain command.
Humidity Sensors: Humidity is generally monitored and controlled by
Data Center air handlers.
If a server environment is having problems with humidity—condensation
or corrosion from too much moisture in the air or static from not
enough—humidity sensors can help diagnose the problem.





Other information useful to have about a Data Center is metrics—
measurements taken regularly to determine how the room functions
over time. There are a lot of data points that can be collected about
a server environment:
Maintaining an Incident Log: To get some perspective on the
performance of your server environment and the incidents that
happen in and around it, keep a log of Data Center-related events.
Record the time, date, and major details of notable occurrences.
Also note incidents in which things go right and downtime or a
catastrophic event is avoided: when utility power fails but the Data
Center runs interrupted thanks to its standby generator.
An incident log that thoroughly tracks Data Center events can be
extremely valuable for upper management. Such a log provides them
with real-world information about the threats posed to company
servers and what infrastructure and processes are (or aren't) in place
to protect that equipment.







Here are several useful categories to separate Data Center-related
incidents into:
Commercial Power (CP)— An interruption in the power that is normally
provided to the Data Center by a utility source.
Connectivity (CO)— A disruption in data connections, either in the
external structured cabling that feeds the company site or those within
the Data Center.
Mechanical—HVAC (AC)— An incident related to the Data Center's
cooling system.
Mechanical—Power (MP)— An incident related to the Data Center's
primary or standby electrical infrastructure.
Miscellaneous (MI)— Events that are worth noting but don't fall in to any
other categories. Perhaps a false alarm in the fire suppression system or
a problem with the room's physical access controls, for example.
Water Leak (WL)— An incident in which unwanted moisture enters the
server environment.






Even more important than knowing what happened in a server
environment is understanding the cause of the incident.
Here are some typical causes of Data Center-related incidents:
External (EX)— External causes are those that originate away from your
company site. Such as: Utility power failures, damage to the structured
cabling, or an earthquake.
Human Error (HU)— Human error applies to incidents that occur because
a person made a mistake rather than the failure of a physical
component. Such as: Powering down the wrong electrical circuits, or
inappropriately pressing an emergency power off button.
Mechanical (ME)— A mechanical cause is the malfunction of
infrastructure at the company site. A belt breaking within an air handler,
or a standby generator not engaging when it is supposed to.
Structural (ST)— The rarest of causes are those related to a building's
structural integrity. Examples of this are a roof leak or the buckling of a
Data Center floor.
Availability Metrics:
 Availability: the degree to which a Data Center is online.
 Measuring your Data Center's availability therefore goes a long way
toward evaluating its contribution to the success of your business.
 Availability metrics can also justify the expense of additional Data
Center infrastructure, either when designing a new room or when
upgrading an existing one.
 Example: your server environment was designed and built with the
goal of achieving 99.99 percent availability, track the number of
outages that occur over a significant time period, perhaps annually,
to determine what its availability has turned out to be.
 You can calculate your Data Center's availability by using the
following formula:
(TIME—OUTAGES) ÷ TIME = Percentage of Availability

TIME is the total number of minutes in a defined time period and
OUTAGES is the cumulative number of minutes that a Data Center was
offline during that period.
 For instance, say a Data Center was offline for 20 minutes over the
course of a 30-day month. There are 43,200 minutes in that month (30
days x 24 hours in a day x 60 minutes in an hour = 43,200 minutes).
Being online for all but 20 minutes translates to:
(43,200—20 min.) ÷ 43,200 min. = 99.95 percent availability.
 By keeping track of the lengths of outages throughout the year, you can
calculate availability for any time period—monthly, quarterly, or
annually.
 Example: Say that your company has four Data Centers, two that are
5000 square feet in size, one that is 10,000 square feet, and one that is
30,000 square feet, for a total of 50,000 square feet. Say that the Data
Center with a 20-minute outage and 99.91 percent availability is one of
the small rooms—5000 square feet. If the other three rooms all stayed
on line for the entire month, what's the cumulative availability for all
50,000 square feet of Data Center space?

The formula then becomes:
((SIZE1 * (TIME-OUTAGES1)) + (SIZE2 x (TIME-OUTAGES2)) + (SIZE3 *
(TIME-OUTAGES3)) + (SIZE4 x (TIME-OUTAGES4)) ÷ (TOTAL SIZE *
TIME)
 Plugging in the monthly statistics for the four Data Centers, with the
smallest having 20 minutes of downtime, you get the following:
((5000 sq. ft. * (43,200—20 min.)) + (5000 sq. ft. * 43,200 min.) +
(10,000 * 43,200 min.) + (30,000 sq. ft. * 43,200 min.)) ÷ (50,000
sq. ft. * (43,200 min.) = 99.995 percent availability.







Other Useful Data:
Cabinet occupancy— How quickly are Data Center cabinet locations
filling up?
Consumable usage— How many server cabinets, cabinet shelves, and
patch cables are used each quarter? This information is helpful for
maintaining proper inventory amounts and future budgeting.
Supplies and vendors— Document the items you stock in your server
environment and the vendors who provide them. Include both everyday
consumables (i.e., patch cords and server cabinets) and those items
needed to complete a Data Center when it is first built (i.e., storage bins,
signage materials, and floor tile pullers).
Major infrastructure changes— Then and now" comparisons can be very
illustrative. The information can also be useful when future retrofit
projects are planned.
Data Center trivia— What's the biggest piece of equipment in the Data
Center? The smallest? How long does it take to install a typical server?
Such trivia might not help you manage the room, but it can be powerful
when explaining Data Center challenges.
Download