Project Closeout Report Project Name Executive Sponsor Project Manager Enterprise Operations Business Continuity – Disaster Recovery Assessment and Feasibility Study DoIT Secretary Marlin Mackey DoIT Deputy Secretary Bob Mayer DoIT Mary Wanda Anaya Date 7/20/2010 Lead Agency DEPARTMENT OF INFORMATION TECHNOLOGY (DOIT) Agency Code 361 PROJECT DESCRIPTION (PROVIDE A BRIEF DESCRIPTION AND PURPOSE FOR THIS PROJECT) The Department of Information Technology received funding for a Disaster Recovery (DR) Assessment and Feasibility Study to determine the best approach for redundancy for its most critical Information Technology (IT) based services and applications. The purpose of this project was to determine the most cost effective means of providing this service. The Initiation Phase of this project includes three visits to the state of Colorado, Arizona, and Oregon to understand and see their operating solutions to providing redundancy within their data centers; to include business continuity and consolidation efforts. In conjunction with the out of state site visits, selected vendors that operate commercial data centers which provide cold, warm and hot recovery services were toured. In-state commercial data centers were also toured. The sites visits included an assessment of their offering for Business Continuity and DR management services. The Planning Phase of the project included meeting with business partners to define the needs for the DR Assessment and Feasibility Study. This portion of the project was developed in house. A high level BIA was conducted. The development of the scope of work for the study and the established requirements for research DR services was completed. The Implementation Phase of this project includes a DR Assessment and Feasibility Study. The DR Assessment included a Threat and Risk Assessment. The Feasibility Study determined the top critical applications for the State of New Mexico and provided the DR recommended model per critical application. The top twelve (12) critical applications spanned across ten (10) agencies. Each agency was interviewed by the project team to assess their Business Continuity and Disaster Recovery preparedness. Within the study are also recommendations for possible DR efforts for consolidating applications and platforms for greater cost savings and operating efficiency. In order for Business Continuity to be successful within the state it is essential that agencies have at lease one staff member that is knowledgeable in Business Continuity in order for the agency to evaluate the recommendations brought forth by the Feasibility Study. Two (2) Business Continuity Awareness classes and one Business Continuity Planning class were provided to the agencies. Some agencies took advantage of the training and sent staff to all classes. The Office of Business Continuity staff also attended formal training and received the Certified Business Resilience Manger (CBRM). Schedule and Budget Planned Start Date December 1, 2008 Actual Start Date December 1, 2008 Planned End Date June 30, 2009 Actual End Date June 30, 2010 Planned Cost: (Budget) $ 250,000.00 Actual Cost: (Total) $ 249,793.31 This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 1 of 11 Project Closeout Report Professional Services Professional Services $ 200,000.00 $ 199,974.80 Hardware $ 0.00 Hardware $ 0.00 Software $ 0.00 Software $ 0.00 Network $ 0.00 Network $ 0.00 Other Other $ 50,000.00 Appropriation History (Include all Funding sources, $ 49,818.51 e.g. Federal, State, County, Municipal laws or grants) Amount Funding Source(s) Fiscal Year 2009 $250,000.00 Laws 2008, Ch. 3, Section 7(13) For an assessment and feasibility study for redundancy of the most critical information technology-bases services and applications. Scope Verification Requirements Review Yes Were the project objectives (expected outcomes) accomplished? X Were all Deliverables submitted and accepted? X No Did the IV&V vendor verify that all deliverables met the requirements? X Have all contracts been closed? Have all final payments been made (i.e., invoices paid) Has adequate knowledge transfer been completed? Explanation/Notes IV&V exception for study. DoIT Quality Assurance provided. X X X TRANSITION TO OPERATIONS: (DESCRIBE AGENCY PLAN TO MIGRATE PROJECT SOLUTION TO PRODUCTION. INCLUDE DOIT IMPACT IF DIFFERENT THAN PREVIOUS REPORT) The information gathered within the study provided the requirements that formulated the mandatory specifications for the scope of work for the DoIT Disaster Recovery and Data Resilience Data Center Site(s) RFP. Maintenance/Operations Yes Are there recurring maintenance/operational costs for the product/service? No X Explanation/Notes $ per Year This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 2 of 11 Project Closeout Report Are there any recommended enhancements or updates? X (Attach comments) Funding source for maintenance/operational costs? N/A (Describe) BUSINESS PERFORMANCE MEASURES (COMPLETE FOR ALL PHASES) COMMENTS: PHASES COMPLETION DATE GOALS/OBJECTIVES AMOUNT RESULTS This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 3 of 11 Project Closeout Report Initiation: May 11, 2009 Business Objective 3 – $49,818.51 Ability to resume critical business functions, i.e. business continuity Business Objective 4 – Identify cold, warm, hot sites and mange DR services. Business Objective 6 – enable an individual from each agency to work directly with the OBC who will be responsible for departmental business continuity and recovery. Technical Objective 1 – Evaluate the impact to DoIT business / operational functions resulting from a disaster Project Goals 1. Other State Government Site Visits 2. Out-of-State Commercial Site Visits 3. In-State Commercial Site Visits 4. Education Site Visits 5. Business Continuity Agency/DoIT Staff Training 6. Business Continuity Staff Training ** ** Training was moved to April 2010 (due to budgets were placed on hold) Planning: March 12, 2009 Project Goals 1. Project Plan 2. Define Needs 3. Develop Scope of Work for DR Assessment and Feasibility Study $ 0.00 1. January/February 2009 – New Mexico Northrop Grumman BigByte Oso Grande Qwest 2. March 2009 Colorado State of Colorado eFort DR DC IBM BC Resilience Center Qwest DR Center Cisco Cyber Center State of Michigan(ph) 3. May 2009 - Arizona State of Arizona I/O DC SunGard DR Site AZ State Univ 3DC 4. August 2009 Oregon State of Oregon CIO Advisory Committee Intel – Lights Out Opus Interactive Infinity Internet 5. April 2009 – Washington Business Continuity Training Business Continuity Certification 6. April/ May 2010 – New Mexico Business Continuity Awareness Training Business Continuity Plan Training March 12, 2009 POD Inc. contract: Activities: 1. Threat and Risk Assessment. 2. Critical Applications Assessment 3. Overall Disaster Recovery Recommendations This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 4 of 11 Project Closeout Report Implementation: March 30, 2010 Closeout: June 30, 2010 Business Objective 1 – Identify the state’s mission critical systems. Technical Objective 2 – Define the amount of sustainable time from outage to recovery of IT infrastructure Technical Objective 3 – IT Recoverability Assessment / Strategy recommendations – Evaluate DoIT’s data center’s recovery capability using current processes and procedures for services above. Recommended improvements will be made to meet the Recovery Point and Recovery Time Objectives. Project Goals POD Inc. Contract DR Assessment and Feasibility Study 1. Conduct a Threat and Risk Assessment 2. Conduct a Critical Application Assessment 3. Provide Overall Disaster Recovery Recommendations. Business Objective 2 – Make accessible the critical and vital computer production environments for each agency within the timeframe specified by each agency Business Objective 5 – Provide business systems that support and enhance the efficiency of State Agencies and sustain their ability to deliver services to the citizens of New Mexico Technical Objective 4 – Continue to implement Redundant Network Recovery strategies and develop documentation to support the switching of systems to the backup networks that will meet Business/Operational recovery requirements. $ 199,974.80 $0.00 Del #1 Discovery Document Del #2 Threat Analysis Report Del #3 Risk Analysis Report Del #4 Threat & Risk Recommendations Document Del #5 Critical Applications Determination Del #6 Evaluate Selected agencies BC & DR Plans Del #7 Evaluate Twelve Critical Applications Architecture Del #8 Critical Applications Recommendations Del #9 Determine Disaster Recovery efforts for Developing Linkage of Like Applications and Platforms Report Del #10 Overall Disaster Recovery Recommendations RFP# 00-361-00-01416 DoIT Disaster Recovery and Data Resilience Data Center Site(s) Category-1 Resilience Data Center Site (GOLD) Production/Failover Available 24x7x365 Category-2 Hot Data Center Site (SILVER) Host Equipment Operating System Application Software Copy of Data – test Available within 8 hrs This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 5 of 11 Project Closeout Report Category-3 Warm Data Center Site (BRONZE) Racks, Power, Data Available within 24 hrs Category-4 Cold Data Center Site (PAPER) only floor space Available within 72 hrs LESSONS LEARNED 1. When the product of the project is a study, it is critical to the project to include a requirement in the contract for a technical writer. The reports that were initially received from the contractor were not written well. On the first half of the contract the Office of Business Continuity exhausted resources and time, working with the contractor’s project manager to clean up the reports. This issue with the quality of the reports was brought to the contractor’s attention. The contractor did respond and restructured the process to route reports to a more qualified individual with technical writing skills before delivering the reports as a final product. 2. When critical information is gathered through an interview process, note gathering should extend to include voice recording to assure all important information is documented. The contractor had one of their staff members responsible for taking notes at each agency interview, however when the notes were reviewed on the contractor’s share point site the notes were minimal and did not record all the information discussed. 3. Even in small projects it is difficult for the Project Manager and the Project Team Leader to be the same individual. When the roles of Project Manager and Team Leader are the same person it becomes increasingly hard to direct the project and yet meet all the project management requirements. The resources to the project were also impacted by the reassignment of a key team member to another project. 4. Continuous Business Continuity training is required within the State of New Mexico to educate the agencies that Business Continuity is not only Disaster Recovery and an IT responsibility, but an ongoing business process to continue providing critical services. IT System Analysis On this document, or as an attachment, provide a summary response, including changes, to the following IT infrastructure topics relating to this project: This project was a study and did not impact the following; This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 6 of 11 Project Closeout Report Describe or estimate this project’s impact on the State Datacenter infrastructure. o Hardware (List type of hardware anticipated. Keep in mind the State Datacenter may have pre-built hardware stacks available): o Network (Include Diagram): o Software / Applications (Provide application schematic if available): Hosting Considerations (If not hosted at the State Datacenter describe your strategy to host at the State Datacenter): Business Continuity Strategy On this document, or as an attachment, provide a summary response, including changes, of your business Continuity Strategy. Emergency and Disaster Management Business Continuity Management Purpose Business Continuity Management will ensure that the appropriate level of administrative management of responsibility is in place to sustain the operation of Information Technology critical business services following a major disaster or emergency. To ensure information technology support services to State government agencies with minimal disruption due to disasters or unforeseen events that would impact the states’ ability to service the citizens of New Mexico. Policy Statement The Office of Business Continuity and Disaster Recovery, under the direction of the Department, shall maintain and test a Business Continuity Plan. The plan will support the continuity of operation of the Departments information technology, to include operations that the Department supports on behalf of other departments or external entities. Procedure The Office of Business Continuity has the primary leadership responsibility to identify risks and to determine what impact these risks have to business operations. The Department’s Management Team shall plan for business continuity based on these risks and document recovery strategies and procedures in a defined business recovery plan that is reviewed, approved, and updated on an annual basis. The plan includes all divisions: business, technology and operational support. All divisions perform functions critical to sustaining service delivery. Responsibilities of Division Directors, IT Managers, and IT Supervisors (information owners) include but are not limited to: Identification and prioritization of critical business processes. Regular assessment of the potential impact of various types of unforeseen events /disasters. Definition of responsibilities and emergency arrangements. Documentation of all procedures and responsibilities. Communication of business continuity and recovery plans to all necessary individuals. Regular testing of business continuity and recovery plans. Regular review of business continuity and recovery plans to ensure they are correct, complete and up-to-date. This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 7 of 11 Project Closeout Report The specific rules and procedures guiding the responsibilities and the actions to be taken in the event of a disaster are specified in the Business Continuity Plan. The plan is assembled from the individual section plans under the direction of the Office of Business Continuity. The Business Continuity Plan shall include the following types of activities: Procedures and criteria for Disaster Declaration that will activate the Disaster Recovery Plan. Include the process for activating the hot site which will restore computer systems and the statewide network within forty-eight (48) hours. Define a notification process of all responsible individuals to include: Incident Manager; Office of Business Continuity; Office of Security; Division Directors; Public Information Officer; Line Managers Define a Damage Assessment Team with procedures for making recommendations to Executive Management regarding the extent of the damage and whether the facilities can be used safely in a reasonable amount of time or whether the hot site should be notified. Include measures to ensure the health and safety of all employees. Define a team of individuals with the necessary skills in evacuation plans, emergency aid centers (like Red Cross, etc.). Assign coordinators for insurance claims and any other concern which the employee feels is important to them and their families. Procedures to recover the information systems once the employees’ needs are satisfied. The following recovery teams must be identified: Business Recovery Teams: o Communications Recovery Team o Finance Recovery Team o Facility Recovery Team o Records Recovery Team o Staff Recovery Team o Logistics Recovery Team Technical Recovery Teams: o Infrastructure Recovery Team o Applications Recovery Team o Data Recovery Team o User Access Recovery Team Annual Review The Business Continuity Plan shall be reviewed based on a defined review process. Division Directors and IT Managers shall review the plan annually and submit their updates and modifications to the plans in June to the Office of Business Continuity. The Office of Business Continuity shall submit the entire plan to the Executive Management for approval. Business Resumption Information Technology has rapid and unpredictable changes. Some changes bring opportunities for the State, while others bring challenges and may present threats and risks. The State has to be responsive yet seamless when providing services and while implementing new technologies that embrace these opportunities. This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 8 of 11 Project Closeout Report However, one of the fastest and most certain ways to seriously impact the State today is to cut off its flow of information. Data is the means that supports the States business. An interruption of hours to normal data access can result in enormous cost to the State. If the interruption is a major incident; the impact may be severe and take months to recover. Planning for just disaster recovery is not enough. What is needed is to ensure that the data is always accessible. By building resilience into the systems that provide the State’s services, business value can be increased, operational efficiency improved and potential points of failure removed. How can the State build resilience? Opportunities are presented when new systems are planned or systems require replacement. At that point systems may be designed with a resilience architecture that provides continuity of operations. The new SHARE system is designed with such resilience in a grid computing architecture. Grid computing enables groups of networked computers to be pooled and provisioned on demand to meet the changing needs of the State. Instead of dedicated servers and storage for each specific application, grid computing enables multiple applications to share computing infrastructure, resulting in much greater flexibility, cost, power efficiency, performance, scalability and availability, all at the same time. Grid computing may also scale out capacity on demand in smaller units, instead of buying oversized systems for peak periods or uncertain growth. A failed or unneeded machine may be removed without interruptions in service; which will save cost and provide continuity of operations. Half the grid for SHARE will reside at the State Data Center housed at the Simms Building and the other half will be hosted at the State Resilience Data Center. Operational Recovery Planning Disaster Recovery for Equipment The contract with Mainline Disaster Recovery Services, LLC. for replacement of equipment remains in place. The agreement is that MAINLINE will replace equipment through, its own inventory, or through other sources such as an equipment reseller, distributor, or directly from the equipment manufacturer, for the purpose of selling, leasing, or renting IBM and OEM equipment to DoIT in the event of such a disaster. DoIT must submit an accurate description of equipment configuration for equipment within the Simms Data Center within thirty (30) days of the Mainline agreement and within sixty (60) days of any modification of the configuration. Mainline agrees to include all non-IBM hardware, including Servers, PC’s, LAN’s, Networks, etc., to the agreement for no additional monthly subscription provided DoIT makes available the equipment configurations. Disaster Recovery Testing DoIT has upgraded the hardware for the Mainframe system. The upgrade of the operating system (OS) is in process and is scheduled for completion in FY11. DoIT has completed Phase II of the State’s Data Center Upgrade. Due to these upgrades and the BC DR Study that provided the Disaster Recovery Site for the State testing will be schedule when the resources become available. Currently we have proven backups (restored systems and data from tape within the past six months). We move backup tapes off-site with regular rotation schedules and within a few weeks the new Disaster Recovery and Data Resilience Site Services contract will be in place. The tentative plan for DR testing for the Mainframe was to develop an agreement with the Department of Work Force Solutions (DWS) and DoIT to share Mainframe systems for DR testing. DWS was to move their Mainframe to the new DR site and there would be a shared cost to bring that system up to the level for DR testing. DWS would perform their testing on the Mainframe at the Simms Building. Since DWS has plans to move off the Mainframe in a couple of years the plan may change to a managed service. However, with the recent This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 9 of 11 Project Closeout Report changes in the DWS administration; DWS and DoIT must meet and define a new plan of action. When the DR Mainframe is defined the testing will be done in phases. The first phase for DR testing will be the connectivity to the new site. The mainframe operating system files will be recovered next. The following phase of the recovery will be the client portion that will involve scheduling staff from DoIT and agencies which have systems on the mainframe. DoIT will work closely with those agencies to finalizing the schedules for testing. In the plan the next systems to test will be the Enterprise SHARE and Enterprise Email. However the resilience design of these systems may move their testing before the Mainframe and will be done during the implementation of the systems. These systems will also involve coordinating efforts with key agencies. The final objective will be to conduct regular DR testing that will assure DoIT will provide seamless and uninterruptible Enterprise Services. Disaster Recovery Communication Efforts The TIWA Core Communication Hub has been relocated for the Rio Grande Corridor to 505 Marquette. This is the core location for Qwest, NMSU, UNM, NMTech, TriState, SuperComputer, termination site for South-East Quadrant, etc. This location positions DoIT to provide a path for DR Enterprise Communications. DoIT is also working with Qwest on establishing state mission critical communications links as follows. 1. TSP (Telecommunications Service Priority) - In the event of a disaster this program provides the legal means for the telecommunication industry to provide preferential treatment for recovery of services for our public safety related sites. 2. Q-Routing Services– QWEST will provide a rerouting of mission critical predefined voice services in the event of a outage or an emergency. 3. G-Card Service– Key staff priority calling for key staff in the event a Disaster, when communication is restricted to only emergency response. 4. Currently reviewing state enterprise telecommunications infrastructure, for future MPLS service, which will transparently re-direct all major communication links to continue state services. Security Strategy (Application and Data Security Process, Plan, or Standard) Physical Security Systems The Department of Information Technology has installed and implemented a state of the art security access control and video surveillance system. The security system consists of biometric and proximity card readers and video surveillance throughout the agency. This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 10 of 11 Project Closeout Report Improving Sentry Functions Security Technicians uncover many of incidents every year. Functions are over dependent on human conditions. These are some of the initiatives that will improve the process. Install Security Information Event Management - One of the essential elements of security is logging events of various intrusions and anomalies. In Fiscal Year 2010 and expanding to 2011 the security information event management will be implemented to provide a minimum of thirty days of retention and include all core security, network and server devices. This will provide greater visibility of information events. The hardware was installed in 2010 and a professional services contract is in process. The estimated completion date is September 1, 2010. Firewall Upgrades - DoIT currently manages several firewalls for itself and various agencies. Most firewalls on state core network are outdated. In Fiscal year 2011 DoIT will upgrade the core Internet firewall with high availability. DoIT also has plans to upgrade several Intranet firewalls. In 2010 the ISP firewall was replaced with a new faster model. Install a Core Intrusion Detection and Prevention System (IDP) - Developing an enterprise IDP solution greatly improved the level of security of state data communication. IDP systems can automatically recognize the signatures of attacks. The IDP solution was implemented in May of 2009 Annual Vulnerability Assessment - Annual network security assessments will be conducted by a reputable 3rd party vendor. This will verify appropriate security configurations, patch levels, device vulnerabilities, hot fixes, unused services, open ports, share permissions and restricted groups are in place. The security assessments are planned and waiting on a funding source. Security Scans - DoIT will perform vulnerability assessments for all agencies/customers on state Intranet network with new network vulnerability appliance. Devices in the data center will also be scanned for security vulnerabilities quarterly. The plan for moving forward with the scans requires funding as well as staffing resources. Project Sign Off The signatures below certify that this project has been completed in accordance to the specified budget, schedule, scope, and achieved the intended outcome. STAKEHOLDERS NAME: SIGNATURE DATE Executive Sponsor (or Designee) Lead Agency Head (or Designee) CIO IT Lead Project Manager This is a controlled document; this version supersedes all other versions. Revision: 3/16/10 Page 11 of 11