Optimizing Infrastructure Performance

Optimizing Infrastructure Performance
Going Beyond the Network with Remote Monitoring and Management
Introduction
There is disagreement over who first said, “If it ain’t broke, don’t fix it.” While we might be safe in
assuming it was no one with responsibility for an IT infrastructure, given the current mindset in many data
centers, it very easily could have been.
There is nothing simple about managing complex infrastructures. Yet, some organizations treat them with
a reactive, almost laissez faire, mindset – with eyes-on from 8 to 5, and crossed fingers after hours –
hoping no problems lurking beneath the surface occur when no one is watching.
Many companies deploy the disciplines and tools necessary to provide basic availability management, and
the hardware and software monitoring that supports it. Unfortunately, the IT staff often lacks the skills,
tools and capacity to reactively monitor – and proactively manage – the infrastructure’s networks, servers,
storage, security, applications and devices – a situation that appears unlikely to change in the near future.
[Figure 1]
Gartner forecasts economic austerity through 2015 that
will continue to constrain overall IT spending. As a result,
companies are expected to ramp up their pursuit of sourcing
alternatives as part of a continuing shift away from capital
expense spending in favor of operational expenditures.1
The pressure to do more with less forces managers to
make critical, yet creative decisions about how to take out
costs while improving IT service delivery. This includes
the increased use of managed IT service providers to
help manage infrastructures, which can instill operational
efficiencies, reduce costs and create tangible business
value.
Figure 1. It’s more than the network. The status of the
infrastructure and its component devices should be visible to
the IT staff at all times, regardless of location.
We will examine some of the challenges companies face,
and how remote infrastructure monitoring and management
can address them by enabling improved performance and
profitability.
Is the solution just more sleep, or changing the way we do business?
Cisco relies heavily upon its infrastructure, IT systems and applications to run its business, but the
company knew it had to make a change. Its engineers and IT staff were spending an inordinate amount
of time on routine network operations, at the expense of new product development – an opportunity
cost the company decided was too high a price to pay, in an industry where continuous innovation drives
shareholder value.
IT staff took turns being on-call 24x7, once every six to eight weeks, which meant repeatedly being paged
to check on system alerts in the middle of the night. Engineers would respond, log tickets, and then go
1.
back to bed. The next morning, they were expected to be fully engaged in their “real jobs,” but
they were, in fact, exhausted. Over time, nighttime pager duty resulted in not only fatigue, but
significantly reduced productivity and job satisfaction. The company’s engineering bench strength –
essential to support the creation of customer value – was being eroded.
The solution came when senior managers developed a shared model to increase agility by outtasking routine network management activities. This allowed engineers to focus on highervalue work, such as planning customer deployments of new technologies. The company then
outsourced monitoring and management of thousands of devices in its global LANs and WAN,
its VPN connectivity, and voice service for 300 company sites and 160 extranet sites. This shift
in operational paradigm liberated dozens of engineers to focus on strategic priorities and core
activities that create customer and shareholder value.2
The challenges of enterprise infrastructure management
While Cisco was primarily concerned with opportunity cost tradeoffs associated with 24x7 network
management, the key challenges facing most senior IT managers extend beyond resource
allocation issues. The primary drivers behind offsite infrastructure management over the past 20
years have been the need for improved agility, productivity, performance and reduced costs.
Situational Awareness. Managers commonly cite a lack of visibility into the infrastructure, and
an inability to assess its overall health and security, hampering their ability to quantify and mitigate
risk. This lack of confidence in the stability and performance of the environment – not just the
network – is often the result of not having conducted formal, end to end assessments to audit,
benchmark and evaluate hardware and software performance.
Companies often lack the specialized tools necessary to conduct periodic assessments that can
expose performance and security gaps, preventive maintenance needs, and cost optimization
opportunities. The expression, “You don’t know what you don’t know,” is appropriate given the lack
of situational awareness in the data center.
Operational Maturity. As organizations become more sophisticated managing the infrastructure
– incorporating ITIL best practices, 24x7 support, and process optimization and integration –
system availability and performance improve, while operating costs go down. Most enterprises,
however, find themselves effectively stalled at the Chaotic and Reactive maturity levels. And,
any management vision of increased internal sophistication seems unattainable given prevailing
economic constraints. [Figure 2]
Further IT maturation may require an outside change
agent with the knowledge, tools and commitment to vault
IT operations to levels that will improve performance and
impact the bottom line.
Figure 2. IT Maturity Scale
Detection and Response to Incidents and Problems.
According to IBM, up to 80% of system outage time is
spent just finding the cause and nature of a problem.3
What makes this situation especially difficult is that most
incidents occur after hours, when many organizations rely
on limited onsite, or on-call, staff to respond to alerts,
incidents and outages. We cannot eliminate all incidents
2.
and problems, but with the right combination of resources, processes and tools, we can greatly
minimize the number and reduce the impact.
Establishment of Standard Processes. When was the last time a band-aid or workaround was
applied to restore system or device availability to buy time until a problem could be analyzed and
resolved? Reactive problem management may be commonplace, but it is a symptom of a short
term mindset that leads to temporary, suboptimal solutions. Using standard ITIL processes and
tools, every component of the infrastructure can be proactively monitored, and problems quickly
diagnosed and elevated, if appropriate, to assure the right knowledge and tools are applied to
deploy long-term, sustainable solutions.
A Top 25 financial services institution was experiencing rapid growth,
overwhelming its infrastructure, and resulting in unacceptable IT service delivery.
An assessment disclosed numerous issues and significant regulatory exposure:
•
•
•
•
•
Less than 33% of 2,500 servers were being monitored.
Tools failed to trigger alerts and issue reports.
Processes were not standardized, aligned or documented.
Escalation paths were ill-defined and frequently ignored.
Process improvement was undisciplined and ineffective.
The solution standardized remote monitoring and management processes,
reducing escalation costs by 85%, while improving performance and mitigating
risk.
Metric
Before
After
% Improvement
49%
86%
70%
Servers Monitored
503
2,517 (100%)
400%+
% Servers Virtualized
15%
39%
160%
>90%
<10%
>80%
Client Satisfaction
Escalations
Infrastructure monitoring and management that creates, not consumes,
business value
“Without having someone who is responsible for designing, creating,
monitoring and improving services end to end, there is little chance that
those services will consistently achieve the levels of value, relevance,
stability and reliability that service recipients expect.”4
The 21st century has been a challenge for CIOs who must balance delivery of high quality IT
services against cost improvement pressures. Leveraging out-tasking and managed infrastructure
support services to either augment or supplant internal resources can be an effective way to drive
infrastructure optimization, and create measurable business value.
CapEX versus OpEX. Capital investment as a percent of total IT spending has declined to its
lowest percentage since 2003. The trend is likely driven by multiple factors, but it reflects a general
reluctance to make capital investments. Meanwhile, IT operational spending on “grow the business”
initiatives has increased its share of the total IT budget.5
Furthermore, in an era of generally flat IT budgets fixed costs often are viewed as a zero sum
3.
game. Managers look to recast budgets in creative ways to free up and reallocate funds to support
business growth through product development, innovative business processes, and new business
models. To cite one industry analyst: “Many organizations are targeting reductions in their run
spending to leave more resources for grow and transform initiatives.”6 [Figure 3]
Whether reining in capital
investment, or reallocating
budget dollars on the
operations side to fund growth
initiatives, there is a valuable
role to be performed by the
managed IT services provider
to help take out costs. From
a TCO perspective, there is
relief for the enterprise in
that the provider reduces the
need for capital expenditures
by supplying the needed
hardware, software and
facilities as a service – on a
Figure 3. Fixed costs to run the business can be reduced and/or reallocated by out-tasking
certain infrastructure management functions in order to fund ‘grow the business’ initiatives.
pay-as-you-go basis. Similarly, operating dollars can be reallocated by reducing costs through
resource sharing, or completely out-tasking 24x7 management responsibility, whether for the
entire infrastructure, or one or more individual areas, e.g., network monitoring, print management,
etc.
By some estimates, remote infrastructure management services may reduce labor costs as much
as 50%, or up to 30% of infrastructure operating costs, excluding switching costs, which some
providers will amortize to sweeten a deal, or to facilitate the service transition.
Paradigm Shift: From Reactive to Proactive. This is the difference between staffing a control
room and waiting for something to happen – which many organizations still do – versus knowing a
specific problem will occur, and preventing it through predictive analysis and incident management.
The utilization of remote engineering to predict and remediate problems before they occur is what
differentiates remote monitoring and management, thus adding measurable business value.
[Figure 4]
Start of Process
New Service –
I.M.A.C.
Request for
Service
P100
P700
L1 – Incident
Resolver
Incident
Awareness
P200
L2 – Incident
Resolver
P800
P300
L3 – Incident
Resolver
P400
L4 – Dispatch
External Resolver
P400
L5 – Problem Mgt
P500
L6 – Change Mgt
P600
Tracking, CMDB, Known Errors, Process, Procedure, Service Manual, K-Base, Forum
End of Process
Figure 4. Beyond Reactive
Monitoring: The addition of
Remote Engineering combines
proactive ITIL processes with
predictive data to resolve
incidents and problems with
sustainable solutions.
4.
Empowered in this manner through standard ITIL processes, and a commitment to continual
service improvement, significant advantages are gained through the ability to effectively execute
change management, and optimize infrastructure performance.
Innovation and Leadership. Organizations increasingly look outside for objective insight and
expertise to drive continual improvement. To that end, SLAs should always set clear expectations
for managed IT service providers to proactively identify issues, process improvements and cost
reduction opportunities. Increasingly, service providers are expected to submit as part of regular
business reviews, thoughtful business case analyses, and recommendations for improvements that
will measurably impact the bottom line in the near term.
A Top 25 U.S. airport faced tough economic realities in the wake of
unprecedented airline industry consolidation. It needed to “right-size” itself
quickly by consolidating IT facilities and operations, increasing productivity, and
establishing 24x7 infrastructure monitoring.
A flawless two-phase build-out and relocation to interim and permanent facilities
were required, without incurring significant downtime. Project staff was subject to
rigorous federal security clearances. The solution set and results included:
•
•
•
•
First-time 24x7 remote monitoring and management of all key devices was
established.
“Total Call Ownership” of all incidents was provided, including data capture to
facilitate root cause analysis and continual service improvement.
Real-time online KPI dashboards were provided and made accessible to all
stakeholders.
Over 97% of all incidents were now being remediated remotely.
Qualifying the managed services provider to take you to the next level
As we have seen, out-tasking some level of infrastructure support can effectively reduce operating
costs and capital expenditures, while measurably improving performance and bottom line results.
Identifying a managed services provider who will elevate infrastructure performance to the level
it deserves is a challenge, but there are a few important “qualifiers” to help evaluate provider
capabilities and commitment to your goals.
Current Technology. Many companies have legacy infrastructure monitoring, capable of little
more than nominal alert monitoring. As a result, they experience the major challenges discussed
earlier – an inability to understand – and have confidence in – the health of the infrastructure.
Conversely, the introduction of specialized analytical tools and integrated, online dashboards enable
the enterprise to proactively manage and optimize its infrastructure. A few salient questions to ask
include:
•
•
•
•
Can they manage virtual as well as physical devices?
Can they monitor beyond the device level into the application layer?
Are they prepared to support you with VDI (virtual desktop infrastructure)?
Can they support enterprise mobility management for your mobile devices?
Real-time Information. In addition to frequent reports and regular business reviews to discuss
trends, short term issues and long-range plans, real time data is critical. The infrastructure is a
“living creature” whose vital signs must be constantly monitored 24x7. IT managers need access
5.
to real-time, online and mobile dashboards regarding current activity and incidents, including
continuous resolution status updates. Key stakeholder groups require different views of
infrastructure performance, for example:
• Executive Summary: Overall health, trends and direction
• Management View: Key indicators, functional operations and focus areas
• Engineering View: Interactive web console, detailed performance and capacity data
Figure 5 represents actual data revealing a significant spike in ticket volume after 8 pm, when the
onsite staff has gone off duty. When this occurred, offsite staff noticed the spike and immediately
began remediation. All issues were subsequently resolved, and the system fully restored by the
time the onsite staff returned the next morning.
Data-driven Continual
Improvement. Remote
monitoring and management
(RMM) tools generate data
the Remote Engineering team
uses as a basis for continual
improvement. The data helps
pinpoint system weak points
requiring remediation and
adjustments that will optimize
the infrastructure. This
proactive approach – enabled
by ongoing data analysis –
is what distinguishes RMM
from ordinary monitoring and
Figure 5. Online and mobile dashboards provide real-time access to key performance measures,
management. For that reason,
trends and incident resolution status.
it is uniquely capable of
delivering cost and performance improvement no reactive model can.
Adherence to Standard Processes. Providers should adhere to major industry standards, for
example, COBIT and SOX, as well as IT industry best practices, such as Six Sigma and ITIL which
establishes effective processes for incident, problem and service level management.
Collaboration. Can the provider’s team work, integrate and cooperate with your team? Their
ability and willingness to adapt to your needs, and to ongoing changes are key considerations.
Less common – but no less important – is the provider’s willingness to actively share its knowledge
and tools with the client. And since no relationship lasts forever, set an expectation upfront that
knowledge transfers occur whenever changes are made to the infrastructure, as well as before you
sever the relationship.
Flexibility. Of all the attributes you might associate with a provider, flexibility is one of the most
highly valued. Flexibility, along with adaptability, scalability and agility, represents a professional
capacity to understand issues and address them quickly, without unnecessary delay, or annoying
litany of change orders. The willingness of the provider to accommodate reasonable changes is the
benchmark of a true partner who knows he will be successful when you are.
6.
Not every company requires end to end infrastructure support, so the flexibility to step in quickly
to monitor and manage an individual area is important – data center, servers, network, printer
environment, cloud environment, virtual desktops, mobile devices or VoIP.
Location. From what location(s) do you need support? Do all the resources need to be onsite at
your location? Can they all be located offsite, or would you consider a blended solution?
Should you augment the IT staff with that of the provider, for example, sharing or dividing level 1
and level 2 incident management? Whatever you decide, your locations should be ITIL compliant,
and provide industry standard redundancy across security, multiple networks and telephone
systems.
A mining and manufacturing company was experiencing significant deterioration
in infrastructure performance. A failing legacy supplier relationship resulted in
poor communication, slow response times, missed deadlines, and insufficient
onsite resources. Solution steps included:
•
•
Replacing a reactive model with proactive 24/7 monitoring and management.
Allocating sufficient onsite resources to standardize operations, resolve
recurring incidents and improve communication.
The client environment was stabilized for the first time in months. Critical
incidents and resolution times were reduced 40%. Real-time performance
dashboards were established and made available to all stakeholders.
Final thoughts
Many companies struggle to balance the need to improve IT service delivery against increasing
pressure to manage costs. This dilemma is unlikely to change in the near future given current
economic conditions, or at least until an organization determines which resources and processes it
must “own,” often at the expense of ‘grow the business’ opportunities.
IT managers know their infrastructures could operate more effectively if provided the resources,
skills, technology, and continual service improvement disciplines to sustain them.
An effective alternative service model exists that will reduce the need for capital expenditures
and take out operational costs by 20% or more. Increasingly, companies come to the conclusion
that improved service delivery and bottom line results are not mutually exclusive – both can be
achieved by relying upon the right partner, onsite or offsite, to make it happen.
7.
About Pomeroy
Pomeroy provides high quality managed IT infrastructure services, professional and staffing
services and procurement and logistics services to Fortune 500 corporations, global outsourcers
and the public sector throughout the U.S., Canada and Europe. A recognized leader in the
service desk and managed desktop services markets, Pomeroy’s ITIL, CSI and HDI certified
professionals employ a process-centric approach to working with clients, either remotely or onpremise, to plan, design, deploy, manage and ultimately optimize each client’s IT infrastructure,
leading to the creation of tangible business value and return on IT investments. Learn more at
www.pomeroy.com.
REFERENCES
1
“Forecast Analysis: IT Outsourcing, Worldwide, 2010-2016, 2Q12 Update.” Gartner. July, 2012. 2 “How Cisco IT uses Cisco
Remote Management Services to Enhance Network Operations.” Cisco. February, 2012. 3 “Gaining efficiency and business
value through better management of your IT infrastructure.” IBM. August, 2011. 4 “Evolving Roles in the IT Organization:
The IT Product Manager.” Gartner. March, 2010. 5 “IT Key Metrics Data 2012.” Gartner. December, 2011. 6 “Outsourcing
Trends, 2011-12: Exploit Changes in Infrastructure Services.” Gartner. January, 2012.
© Pomeroy, 2012. All rights reserved. All trademarks, trade names, service marks referenced herein are the property of
their respective companies. V1912.
Would you like to learn more about the clients profiled in this white paper? Click here.
8.