Security Patch Management Evolution for Data-Center Servers at Microsoft Published August 2013 The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own company or organization. Assessing and maintaining the integrity of software in a networked environment through a well-defined patch management program is a key first step toward successful information security. By focusing on policies, technologies, and processes, Microsoft Information Technology (MSIT) was able to reduce risk, improve performance, and improve availability of software resources at Microsoft. Situation Without a standardized tool or process, Microsoft IT was challenged to manage data center server patching. This resulted in unacceptable vulnerability to Microsoft's server environment. Solution Microsoft IT chose a multi-pronged approach to address this situation. Focusing on policy changes, technology solutions and well defined processes enabled MSIT to achieve their goals. Security patch management is a process that gives organizations control over the deployment and maintenance of interim software patches into their production environments. It helps organizations maintain the security and stability of the production environment. At Microsoft, the configuration management program of today evolved from a program that initially used Microsoft Systems Management Server (SMS) to address only security patch management. When System Center Configuration Manager released, MSIT began to use the product as a discovery mechanism for asset inventory information and security patch management. From the perspective of managing security patches, not much has changed from the core activities of earlier efforts. The number of servers for which MSIT manages the configuration continually grows—up from 24,000 servers in 2010 to 34,000 servers in 2013. The integration and enhancement of the features available in Configuration Manager has helped MSIT keep up with the ever-increasing number of threats and the volume of security patches now regularly released. Benefits Patch compliance increased from 70% to 96% Patch variability decreased from 40% to 5% The patch cycle improved from 30 to 19 days 2 | Technical Case Study Situation In 2010 MSIT continued with renewed rigor, a journey to improve the security of data center servers. The primary driver of this effort centered on server security. Server security is as important as network security because servers often hold a great deal of an organization's vital information. If a server is compromised, all of its contents may become available to steal or manipulate at will. Applying security patches in a timely fashion highly reduces the risk of having a security breach and all the related problems that come with it, like data theft, data loss, or even legal penalties. Patches were being applied to Microsoft servers on average 30 days from patch release, leaving vulnerabilities to zero-day attacks that occur during the vulnerability window that exists in the time between when a vulnerability is first exploited and when developers start to develop and publish a patch to counter that threat. Contributing factors to the situation included having many instances of Microsoft System Center Configuration Manager spread across IT adding cost and complexity to operational management of the environment. In addition, patch compliance was running at 60% with variability of 20-40%. This resulted in compounding vulnerabilities month over month as patches lagged. Long patching cycles, low patch compliance and high variability left Microsoft vulnerable to well published hacks as well as emergency situations. This necessitated emergency scrambles and out of band patching requirements resulting in increased costs as large teams of people rallied to address the issue. An additional negative outcome was outages for users as patching took line of business applications and operations offline. Solution MSIT's solution approach included policy changes, use of new technologies and process changes. This multi-pronged approach supported the increasing need Microsoft had to ensure a secure environment. Policies One of the foundational policies required to improve patching at Microsoft was the implementation of compliance deadlines. The organization was serious about limiting and meeting their risk obligation to the board of directors which required senior leadership to uphold compliance deadlines. MSIT adopted the policy that that a patch not installed by a server owner prior to a compliance deadline would be installed for them. Executive sponsorship was key to getting server owners to participate and adhere to deadlines. Technologies In addition to policy implementation, technology was also adopted to support the goals. For 2010 the focus was on configuration manager server agent health. Instead of continually reviewing issues server by server, MSIT started grouping issues by symptom and doing root cause analysis on the largest buckets of issues. Once root cause was determined and the fix implemented, a new baseline would be measured and the process repeated until that bucket of symptoms was at zero. This focus was responsible for the jump in patch compliance in 2011 from 70% to 90%. In 2013 MSIT expanded their automation tool set to include System Center Orchestrator, a component of the System Center suite. The first scenario targeted was patching servers in a clustered environment. In this complex scenario, the goal is to patch and reboot each server participating in a cluster in sequenced fashion, ensuring the end-user experience is not compromised. Traditionally an operator running scripts and validation steps tailored to an application would perform these steps until each server in the cluster was compliant. Using Orchestrator, these scripts and business logic were transformed into a workflow and programmatically executed across the entire cluster. The result was improved predictability by reducing error-prone manual activities. 3 | Technical Case Study Orchestrator is also used as the “suspenders” to the “belt” provided by System Center Configuration Manager. In situations where Configuration Manager logs a failed attempt to patch a server, a signal is passed to Orchestrator to initiate a standard patch workflow. The workflow repeats until the server is successfully patched, or the service windows expires. In this scenario transient infrastructure or unhealthy SCCM issues are mitigated. With the addition of System Center Orchestrator, MSIT has improved patch compliance from 90% to 96%, and done so with a smaller labor footprint. Processes Along with policy and technology efforts, there was a significant focus on processes. This included reengineering current processes as well as implementing new ones. One of the initial wins in 2010 was consolidating system center configuration manager server instances into one operational group in MSIT. Approximately 150 instances were consolidated into a handful of centrally managed servers. This resulted in decreased operational and maintenance costs as the footprint to manage became much smaller. Also in 2010, MSIT implemented a new role called Service Transition Managers to be interface between IT operations and internal IT group needs. This provided an opportunity to onboard internal clients to more automated processes and tools decreasing variability further and decreasing the need for manual patching across the company. The priority was on driving adoption of the automated patching service with internal MSIT groups. Service Transition Managers collected requirements for further features to the service to increase adoption. In 2012, MSIT instituted the Patch Cycle Triage process. This included weekly instead of monthly reviews of agent health issues and publishing of metrics and reports to all patching stakeholders. This process change increased the visibility of the patching efforts and clear accountability resulted in more complete and rapid resolution of issues. Below is a list of example metrics that MSIT gathers data on for review and to ensure visibility to the overall performance of the area. Table 1. Patch Management Metrics Metric Description Number of patches released Number of released patched per month, provides a baseline for month-over-month comparison. Overall compliance per patch cycle Patch success ratio (per patch) Patch success ratio (per server) Number of support incidents (per patch) Agent health – 98% healthy (daily measurement) Time from smoke test success to 60% saturation deployment Overall compliance metric for all patched servers in the environment against the successful deployment of all updates during a patch cycle. This metric can be used to determine whether a single patch failure negatively impacted overall compliance metrics. Can be used to determine whether a specific type of server or configuration is the common factor in patch success or failure Number of support engagements that are initiated during a patch deployment per patch. Number of systems with a CM agent installed which have successfully returned inventory data and patch results within configured refresh schedule This measurement establishes an ongoing baseline comparison that helps validate each milestone success of the patch process in meeting overall compliance goals for each patch cycle. 4 | Technical Case Study MSIT has formalized the security patch process. Patches are released the second Tuesday of every month. MSIT has adopted a 19-day cycle to complete patch and software updates. The 19-day cycle, developed in cooperation with executive leadership, operations, server and application owners, and Information Security, balances the desire to reduce risk and provide the business the time to prepare and orchestrate updates across test and production servers. The process drives the activities of the teams that are accountable for security patching. This 19 day cycle is a significant improvement from the 30 day cycle followed in 2010. To provide context for this process, the below diagram outlines the architecture that Microsoft uses for server configuration management. Architecture for server configuration management at Microsoft 5 | Technical Case Study Conclusion The biggest change that has occurred since Microsoft first employed a patch management process has been the cadence and consistency in which patches are applied. The established process of patch management allows predictability for a server or application owner, resulting in the ability to meet compliance expectations. Patching compliance increased from 70% in 2010 to 96% in 2013 and patch variability decreased from 20-40% in 2010 to 3-5% in 2013. Improved processes and the use of System Center Configuration Manager and System Center Orchestrator have reduced the patch cycle from 30 to 19 days, despite a steady increase in the number of released patches, the inclusion of non-security software updates, software distributions, and growth in the number of servers in the environment. Successfully deploying System Center Configuration Manager and the Orchestrator based solutions functionality has automated patching and significantly reduced manual patching efforts. The security patch management service was designed to proactively narrow risk by shortening the amount of time that a security or configuration vulnerability can affect servers on the network. This has been achieved through the creation of a predictable global process, centralized reporting and administration, and policy support to ensure compliance. Resources Server Configuration Management at Microsoft Microsoft IT was able to improve performance and server availability and reduce risks by shortening the cycle time to deliver security and non-security updates. Desired configuration management has enabled IT administrators to identify configuration drift across platforms services and Line of Business applications. Technical White Paper Related videos Delivering Results - Using System Center Orchestrator to Patch Complex Data Center Scenarios (Level 200) Learn how Microsoft achieves immediate and greater than 95% patch management compliance including remediation within maintenance windows for complex automation scenarios using System Center 2012 - Orchestrator. Listen How Microsoft IT Implements Server Patch Management Minimizing the threat of vulnerabilities requires organizations to have properly configured systems, to use the latest software, and to install the recommended software updates. Assessing and maintaining the integrity of software in a networked environment through a well-defined patch management program is a key first step toward successful information security. Microsoft IT uses the Systems Center Suite as the primary solution in its server patch management process. Watch video Learn more 6 | Technical Case Study For More Information For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to: http://www.microsoft.com http://www.microsoft.com/microsoft-IT © 2013 Microsoft Corporation. All rights reserved. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.