Optimizing Infrastructure Performance Going Beyond the Network with Remote Monitoring and Management Introduction There is disagreement over who first said, “If it ain’t broke, don’t fix it.” While we might be safe in assuming it was no one with responsibility for an IT infrastructure, given the current mindset in many data centers, it very easily could have been. There is nothing simple about managing complex infrastructures. Yet, some organizations treat them with a reactive, almost laissez faire, mindset – with eyes-on from 8 to 5, and crossed fingers after hours – hoping no problems lurking beneath the surface occur when no one is watching. Many companies deploy the disciplines and tools necessary to provide basic availability management, and the hardware and software monitoring that supports it. Unfortunately, the IT staff often lacks the skills, tools and capacity to reactively monitor – and proactively manage – the infrastructure’s networks, servers, storage, security, applications and devices – a situation that appears unlikely to change in the near future. [Figure 1] Gartner forecasts economic austerity through 2015 that will continue to constrain overall IT spending. As a result, companies are expected to ramp up their pursuit of sourcing alternatives as part of a continuing shift away from capital expense spending in favor of operational expenditures.1 The pressure to do more with less forces managers to make critical, yet creative decisions about how to take out costs while improving IT service delivery. This includes the increased use of managed IT service providers to help manage infrastructures, which can instill operational efficiencies, reduce costs and create tangible business value. Figure 1. It’s more than the network. The status of the infrastructure and its component devices should be visible to the IT staff at all times, regardless of location. We will examine some of the challenges companies face, and how remote infrastructure monitoring and management can address them by enabling improved performance and profitability. Is the solution just more sleep, or changing the way we do business? Cisco relies heavily upon its infrastructure, IT systems and applications to run its business, but the company knew it had to make a change. Its engineers and IT staff were spending an inordinate amount of time on routine network operations, at the expense of new product development – an opportunity cost the company decided was too high a price to pay, in an industry where continuous innovation drives shareholder value. IT staff took turns being on-call 24x7, once every six to eight weeks, which meant repeatedly being paged to check on system alerts in the middle of the night. Engineers would respond, log tickets, and then go 1. back to bed. The next morning, they were expected to be fully engaged in their “real jobs,” but they were, in fact, exhausted. Over time, nighttime pager duty resulted in not only fatigue, but significantly reduced productivity and job satisfaction. The company’s engineering bench strength – essential to support the creation of customer value – was being eroded. The solution came when senior managers developed a shared model to increase agility by outtasking routine network management activities. This allowed engineers to focus on highervalue work, such as planning customer deployments of new technologies. The company then outsourced monitoring and management of thousands of devices in its global LANs and WAN, its VPN connectivity, and voice service for 300 company sites and 160 extranet sites. This shift in operational paradigm liberated dozens of engineers to focus on strategic priorities and core activities that create customer and shareholder value.2 The challenges of enterprise infrastructure management While Cisco was primarily concerned with opportunity cost tradeoffs associated with 24x7 network management, the key challenges facing most senior IT managers extend beyond resource allocation issues. The primary drivers behind offsite infrastructure management over the past 20 years have been the need for improved agility, productivity, performance and reduced costs. Situational Awareness. Managers commonly cite a lack of visibility into the infrastructure, and an inability to assess its overall health and security, hampering their ability to quantify and mitigate risk. This lack of confidence in the stability and performance of the environment – not just the network – is often the result of not having conducted formal, end to end assessments to audit, benchmark and evaluate hardware and software performance. Companies often lack the specialized tools necessary to conduct periodic assessments that can expose performance and security gaps, preventive maintenance needs, and cost optimization opportunities. The expression, “You don’t know what you don’t know,” is appropriate given the lack of situational awareness in the data center. Operational Maturity. As organizations become more sophisticated managing the infrastructure – incorporating ITIL best practices, 24x7 support, and process optimization and integration – system availability and performance improve, while operating costs go down. Most enterprises, however, find themselves effectively stalled at the Chaotic and Reactive maturity levels. And, any management vision of increased internal sophistication seems unattainable given prevailing economic constraints. [Figure 2] Further IT maturation may require an outside change agent with the knowledge, tools and commitment to vault IT operations to levels that will improve performance and impact the bottom line. Figure 2. IT Maturity Scale Detection and Response to Incidents and Problems. According to IBM, up to 80% of system outage time is spent just finding the cause and nature of a problem.3 What makes this situation especially difficult is that most incidents occur after hours, when many organizations rely on limited onsite, or on-call, staff to respond to alerts, incidents and outages. We cannot eliminate all incidents 2. and problems, but with the right combination of resources, processes and tools, we can greatly minimize the number and reduce the impact. Establishment of Standard Processes. When was the last time a band-aid or workaround was applied to restore system or device availability to buy time until a problem could be analyzed and resolved? Reactive problem management may be commonplace, but it is a symptom of a short term mindset that leads to temporary, suboptimal solutions. Using standard ITIL processes and tools, every component of the infrastructure can be proactively monitored, and problems quickly diagnosed and elevated, if appropriate, to assure the right knowledge and tools are applied to deploy long-term, sustainable solutions. A Top 25 financial services institution was experiencing rapid growth, overwhelming its infrastructure, and resulting in unacceptable IT service delivery. An assessment disclosed numerous issues and significant regulatory exposure: • • • • • Less than 33% of 2,500 servers were being monitored. Tools failed to trigger alerts and issue reports. Processes were not standardized, aligned or documented. Escalation paths were ill-defined and frequently ignored. Process improvement was undisciplined and ineffective. The solution standardized remote monitoring and management processes, reducing escalation costs by 85%, while improving performance and mitigating risk. Metric Before After % Improvement 49% 86% 70% Servers Monitored 503 2,517 (100%) 400%+ % Servers Virtualized 15% 39% 160% >90% <10% >80% Client Satisfaction Escalations Infrastructure monitoring and management that creates, not consumes, business value “Without having someone who is responsible for designing, creating, monitoring and improving services end to end, there is little chance that those services will consistently achieve the levels of value, relevance, stability and reliability that service recipients expect.”4 The 21st century has been a challenge for CIOs who must balance delivery of high quality IT services against cost improvement pressures. Leveraging out-tasking and managed infrastructure support services to either augment or supplant internal resources can be an effective way to drive infrastructure optimization, and create measurable business value. CapEX versus OpEX. Capital investment as a percent of total IT spending has declined to its lowest percentage since 2003. The trend is likely driven by multiple factors, but it reflects a general reluctance to make capital investments. Meanwhile, IT operational spending on “grow the business” initiatives has increased its share of the total IT budget.5 Furthermore, in an era of generally flat IT budgets fixed costs often are viewed as a zero sum 3. game. Managers look to recast budgets in creative ways to free up and reallocate funds to support business growth through product development, innovative business processes, and new business models. To cite one industry analyst: “Many organizations are targeting reductions in their run spending to leave more resources for grow and transform initiatives.”6 [Figure 3] Whether reining in capital investment, or reallocating budget dollars on the operations side to fund growth initiatives, there is a valuable role to be performed by the managed IT services provider to help take out costs. From a TCO perspective, there is relief for the enterprise in that the provider reduces the need for capital expenditures by supplying the needed hardware, software and facilities as a service – on a Figure 3. Fixed costs to run the business can be reduced and/or reallocated by out-tasking certain infrastructure management functions in order to fund ‘grow the business’ initiatives. pay-as-you-go basis. Similarly, operating dollars can be reallocated by reducing costs through resource sharing, or completely out-tasking 24x7 management responsibility, whether for the entire infrastructure, or one or more individual areas, e.g., network monitoring, print management, etc. By some estimates, remote infrastructure management services may reduce labor costs as much as 50%, or up to 30% of infrastructure operating costs, excluding switching costs, which some providers will amortize to sweeten a deal, or to facilitate the service transition. Paradigm Shift: From Reactive to Proactive. This is the difference between staffing a control room and waiting for something to happen – which many organizations still do – versus knowing a specific problem will occur, and preventing it through predictive analysis and incident management. The utilization of remote engineering to predict and remediate problems before they occur is what differentiates remote monitoring and management, thus adding measurable business value. [Figure 4] Start of Process New Service – I.M.A.C. Request for Service P100 P700 L1 – Incident Resolver Incident Awareness P200 L2 – Incident Resolver P800 P300 L3 – Incident Resolver P400 L4 – Dispatch External Resolver P400 L5 – Problem Mgt P500 L6 – Change Mgt P600 Tracking, CMDB, Known Errors, Process, Procedure, Service Manual, K-Base, Forum End of Process Figure 4. Beyond Reactive Monitoring: The addition of Remote Engineering combines proactive ITIL processes with predictive data to resolve incidents and problems with sustainable solutions. 4. Empowered in this manner through standard ITIL processes, and a commitment to continual service improvement, significant advantages are gained through the ability to effectively execute change management, and optimize infrastructure performance. Innovation and Leadership. Organizations increasingly look outside for objective insight and expertise to drive continual improvement. To that end, SLAs should always set clear expectations for managed IT service providers to proactively identify issues, process improvements and cost reduction opportunities. Increasingly, service providers are expected to submit as part of regular business reviews, thoughtful business case analyses, and recommendations for improvements that will measurably impact the bottom line in the near term. A Top 25 U.S. airport faced tough economic realities in the wake of unprecedented airline industry consolidation. It needed to “right-size” itself quickly by consolidating IT facilities and operations, increasing productivity, and establishing 24x7 infrastructure monitoring. A flawless two-phase build-out and relocation to interim and permanent facilities were required, without incurring significant downtime. Project staff was subject to rigorous federal security clearances. The solution set and results included: • • • • First-time 24x7 remote monitoring and management of all key devices was established. “Total Call Ownership” of all incidents was provided, including data capture to facilitate root cause analysis and continual service improvement. Real-time online KPI dashboards were provided and made accessible to all stakeholders. Over 97% of all incidents were now being remediated remotely. Qualifying the managed services provider to take you to the next level As we have seen, out-tasking some level of infrastructure support can effectively reduce operating costs and capital expenditures, while measurably improving performance and bottom line results. Identifying a managed services provider who will elevate infrastructure performance to the level it deserves is a challenge, but there are a few important “qualifiers” to help evaluate provider capabilities and commitment to your goals. Current Technology. Many companies have legacy infrastructure monitoring, capable of little more than nominal alert monitoring. As a result, they experience the major challenges discussed earlier – an inability to understand – and have confidence in – the health of the infrastructure. Conversely, the introduction of specialized analytical tools and integrated, online dashboards enable the enterprise to proactively manage and optimize its infrastructure. A few salient questions to ask include: • • • • Can they manage virtual as well as physical devices? Can they monitor beyond the device level into the application layer? Are they prepared to support you with VDI (virtual desktop infrastructure)? Can they support enterprise mobility management for your mobile devices? Real-time Information. In addition to frequent reports and regular business reviews to discuss trends, short term issues and long-range plans, real time data is critical. The infrastructure is a “living creature” whose vital signs must be constantly monitored 24x7. IT managers need access 5. to real-time, online and mobile dashboards regarding current activity and incidents, including continuous resolution status updates. Key stakeholder groups require different views of infrastructure performance, for example: • Executive Summary: Overall health, trends and direction • Management View: Key indicators, functional operations and focus areas • Engineering View: Interactive web console, detailed performance and capacity data Figure 5 represents actual data revealing a significant spike in ticket volume after 8 pm, when the onsite staff has gone off duty. When this occurred, offsite staff noticed the spike and immediately began remediation. All issues were subsequently resolved, and the system fully restored by the time the onsite staff returned the next morning. Data-driven Continual Improvement. Remote monitoring and management (RMM) tools generate data the Remote Engineering team uses as a basis for continual improvement. The data helps pinpoint system weak points requiring remediation and adjustments that will optimize the infrastructure. This proactive approach – enabled by ongoing data analysis – is what distinguishes RMM from ordinary monitoring and Figure 5. Online and mobile dashboards provide real-time access to key performance measures, management. For that reason, trends and incident resolution status. it is uniquely capable of delivering cost and performance improvement no reactive model can. Adherence to Standard Processes. Providers should adhere to major industry standards, for example, COBIT and SOX, as well as IT industry best practices, such as Six Sigma and ITIL which establishes effective processes for incident, problem and service level management. Collaboration. Can the provider’s team work, integrate and cooperate with your team? Their ability and willingness to adapt to your needs, and to ongoing changes are key considerations. Less common – but no less important – is the provider’s willingness to actively share its knowledge and tools with the client. And since no relationship lasts forever, set an expectation upfront that knowledge transfers occur whenever changes are made to the infrastructure, as well as before you sever the relationship. Flexibility. Of all the attributes you might associate with a provider, flexibility is one of the most highly valued. Flexibility, along with adaptability, scalability and agility, represents a professional capacity to understand issues and address them quickly, without unnecessary delay, or annoying litany of change orders. The willingness of the provider to accommodate reasonable changes is the benchmark of a true partner who knows he will be successful when you are. 6. Not every company requires end to end infrastructure support, so the flexibility to step in quickly to monitor and manage an individual area is important – data center, servers, network, printer environment, cloud environment, virtual desktops, mobile devices or VoIP. Location. From what location(s) do you need support? Do all the resources need to be onsite at your location? Can they all be located offsite, or would you consider a blended solution? Should you augment the IT staff with that of the provider, for example, sharing or dividing level 1 and level 2 incident management? Whatever you decide, your locations should be ITIL compliant, and provide industry standard redundancy across security, multiple networks and telephone systems. A mining and manufacturing company was experiencing significant deterioration in infrastructure performance. A failing legacy supplier relationship resulted in poor communication, slow response times, missed deadlines, and insufficient onsite resources. Solution steps included: • • Replacing a reactive model with proactive 24/7 monitoring and management. Allocating sufficient onsite resources to standardize operations, resolve recurring incidents and improve communication. The client environment was stabilized for the first time in months. Critical incidents and resolution times were reduced 40%. Real-time performance dashboards were established and made available to all stakeholders. Final thoughts Many companies struggle to balance the need to improve IT service delivery against increasing pressure to manage costs. This dilemma is unlikely to change in the near future given current economic conditions, or at least until an organization determines which resources and processes it must “own,” often at the expense of ‘grow the business’ opportunities. IT managers know their infrastructures could operate more effectively if provided the resources, skills, technology, and continual service improvement disciplines to sustain them. An effective alternative service model exists that will reduce the need for capital expenditures and take out operational costs by 20% or more. Increasingly, companies come to the conclusion that improved service delivery and bottom line results are not mutually exclusive – both can be achieved by relying upon the right partner, onsite or offsite, to make it happen. 7. About Pomeroy Pomeroy provides high quality managed IT infrastructure services, professional and staffing services and procurement and logistics services to Fortune 500 corporations, global outsourcers and the public sector throughout the U.S., Canada and Europe. A recognized leader in the service desk and managed desktop services markets, Pomeroy’s ITIL, CSI and HDI certified professionals employ a process-centric approach to working with clients, either remotely or onpremise, to plan, design, deploy, manage and ultimately optimize each client’s IT infrastructure, leading to the creation of tangible business value and return on IT investments. Learn more at www.pomeroy.com. REFERENCES 1 “Forecast Analysis: IT Outsourcing, Worldwide, 2010-2016, 2Q12 Update.” Gartner. July, 2012. 2 “How Cisco IT uses Cisco Remote Management Services to Enhance Network Operations.” Cisco. February, 2012. 3 “Gaining efficiency and business value through better management of your IT infrastructure.” IBM. August, 2011. 4 “Evolving Roles in the IT Organization: The IT Product Manager.” Gartner. March, 2010. 5 “IT Key Metrics Data 2012.” Gartner. December, 2011. 6 “Outsourcing Trends, 2011-12: Exploit Changes in Infrastructure Services.” Gartner. January, 2012. © Pomeroy, 2012. All rights reserved. All trademarks, trade names, service marks referenced herein are the property of their respective companies. V1912. Would you like to learn more about the clients profiled in this white paper? Click here. 8.