DISASTER RECOVERY DATA NETWORKING FOR A FINANCIAL INSTITUTION Michael A Soldwisch B.S., California State University, Sacramento, 2005 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in BUSINESS ADMINISTRATION (Management Information Systems) at CALIFORNIA STATE UNIVERSITY, SACRAMENTO FALL 2011 DISASTER RECOVERYDATA NETWORKING FOR A FINANCIAL INSTITUTION A Project by Michael A Soldwisch Approved by: ____________________________________, Committee Chair Beom-Jin Choi, Ph.D. ________________________________ Date ii Student: Michael A Soldwisch I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. ___________________________________________ _________________________ Monica Lam, Ph.D. Date Associate Dean for Graduate and External Programs College of Business Administration iii Abstract of DISASTER RECOVERYDATA NETWORKING FOR A FINANCIAL INSTITUTION by Michael A Soldwisch The Disaster Recovery Data Networking for a Financial Institution details the advanced data networking challenges that a Credit Union faced during a project to improve its disaster recovery capabilities. By examining this technically interesting project, one can gain a better understanding of the practical application of advanced data networking technologies and the reasoning behind their use in an organizational setting. These technologies were used to build a geographically disperse site-to-site bridged network segment to support the disaster recovery capabilities of an active/passive core host cluster. This project report shows that a Credit Union employed these technologies to meets its strategic organizational goals of providing business continuity capabilities in the event of a disaster that destroys or otherwise makes its centralized data center core resources unavailable. The successful application of these advanced data networking technologies was instrumental in meeting the organization’s strategic objectives in this area. Despite the complexity and technical challenges of the advanced data network technologies employed to solve the Credit Union’s business continuity problems, the iv success was driven as much by solid project management practices, such as detailed planning, organizational support and expert advice, as by technical expertise.. ________________________________________, Committee Chair Beom-Jin Choi, Ph.D. __________________________________ Date v TABLE OF CONTENTS List of Tables ................................................................................................................... viii List of Figures .................................................................................................................... ix Chapter 1. EXECUTIVE SUMMARY ............................................................................................ 1 2. ORGANIZATIONAL BACKGROUND ....................................................................... 3 3. PROJECT METHODOLOGY ....................................................................................... 9 4. PROJECT PLANNING ................................................................................................ 10 5. NETWORK SEGMENTATION PHASE .................................................................... 15 Network Segmentation Analysis........................................................................... 15 Network Segmentation Design ............................................................................. 25 Layer Two Design Issues .......................................................................... 25 Layer Three Design Issues ........................................................................ 29 Network Segmentation Implementation ............................................................... 32 Post Segmentation Cutover Business Unit Testing .................................. 37 Secondary Site Network Segmentation ................................................................ 38 6. NETWORK BRIDGE PHASE ..................................................................................... 39 Network Bridging Scope Analysis and Design .................................................... 39 Network Bridging Implementation ....................................................................... 42 Bridge Network Stress Testing ............................................................................. 44 vi 7. SYSTEMS INTEGRATION IMPLEMNATION ........................................................ 49 8. NETWORK BRIDGE TESTING REVISITED ........................................................... 51 9. FAILOVER TESTING ................................................................................................. 53 10. MANAGERIAL CONSIDERATIONS ...................................................................... 56 Detailed Planning .................................................................................................. 56 Expert Advice ....................................................................................................... 58 Organizational Support ......................................................................................... 59 Risk Management Techniques .............................................................................. 59 11. CONCLUSIONS AND RECOMMENDATIONS ..................................................... 61 Appendices........................................................................................................................ 63 Appendix A Primary Site Core Switch Configuration Fragments ................................... 64 Appendix B Primary Site WAN Router Configuration Fragments .................................. 74 Bibliography ..................................................................................................................... 87 vii LIST OF TABLES 1. Headquarters Network Segments .................................................................................. 22 2. Network Segmentation Pre-Cutover Tasks ................................................................... 33 3. Network Segmentation Cutover Schedule .................................................................... 35 4. Non-Bridged Traffic MTU Calculations ...................................................................... 45 5. Bridged Segment MTU Calculations ............................................................................ 46 viii LIST OF FIGURES 1. Organizational Chart (IT) ............................................................................................... 4 ix 1 Chapter 1 EXECUTIVE SUMMARY The disaster recovery project involves several major phases to develop the ability of the Credit Union to respond to temporary or permanent loss of the ability to operate mission critical information technology systems that are normally supported by the primary headquarters facility data center. The project will develop capabilities of operating these critical systems using resources at a secondary disaster recovery facility data center. The first phases of the project are the activities associated with developing and implementing a geographically dispersed multisite bridged network segment for the core host systems. The ability to locate a system at either site will be developed during these phases of the project. The Credit Union will use a bridging strategy instead of a naming strategy to support disaster recovery for the core transaction system, which is an active/passive solution. This solution depends on the Credit Union’s Network Engineering department developing a multisite bridged network segment for use by the primary and secondary host systems. In order to meet the Credit Union’s recovery objectives, the solution also depends on the successful implementation and testing of the Jack Henry & Associates’ Disaster Recovery software in the next phase of the project. During the planning, analysis, design, and implementation activities of the disaster recovery project, the relative successes of the networking phase versus the systems integration and testing phase were highly dependent on the level of detailed planning that was performed for the implementation for these phases. While there was a 2 great deal of detailed planning for the implementation activities of the network phase of the project, there was very little formal planning for the implementation phase of the systems integration and testing phase of the project. As a result, the network phase was fairly successful and was completed on time while the systems integration and testing phases have been significantly less successful and have been faced with extended delays even though there was more time pressure to complete as well as elevated technical and organization complexity in the networking phase. 3 Chapter 2 ORGANIZATIONAL BACKGROUND The Credit Union serves more than 119,000 members with more than $1.2 billion in assets. The Credit Union was chartered over 75 years ago in 1933. As a Credit Union, the organization is non-profit and member owned. There are 12 branch locations in a metropolitan area with a population of 2.15 million persons. The Credit Union Information System management is headed by the Vice President of Information Systems, who reports directly to the Chief Executive Officer. Figure 1 – Organizational Chart (IT) shows the organizational structure of the Information Systems departments. The Application Development manager, the Security Manager, and the Project Management Office Manager each report directly to the VicePresident. The Director of Information Systems also reports to the Vice President. The Service Desk Manager, the Network Manager, and the Systems Administration Manager all report to the Director. The overall headcount for the Information Systems Departments is 26 full-time employees. The Credit Union employs about 300 full-time equivalent employees. 4 Figure 1 Organizational Chart (IT) The Credit Union operations are highly dependent on the transactional financial information that is maintained in the organization’s information system resources. Information systems in the banking industry do not merely fulfill support functions; rather they are considered production systems (Arduini & Morabito, 2010). The Credit Union must be able to access the information systems resources in order provide services to its members. The Credit Union has recognized the need for a disaster recovery plan for some time, and some efforts have been made to provide capabilities to restore the information systems’ operational capabilities in the event of a disaster. These efforts have included maintaining offsite data backups, purchasing redundant equipment, and 5 even the existence of remote disaster recovery data center facilities. So, at the surface level, there appears to be a commitment to preparing for disaster recovery; however, these preparations did not included any real consideration for testing and verifying the Credit Union’s ability to effectively use its disaster recovery resources. Tammineedi (2010) emphasizes that testing and exercising the disaster recovery plan is a key element of a standards-based approach to business continuity. Furthermore, considerations such as realistic Recovery Time Objective (RTO) and Recovery Point Objective (RPO) had not been analyzed. The RTO is the amount of time between service disruption and restoration of services. While the business had made efforts to insure that recovery would be possible, there was no agreed upon expectation as to what the RTO would be. No one knew whether the RTO was minutes, hours, days, or even weeks. This is indicative of the lack of clear recovery objectives and limitations from senior management that are required for effective disaster recovery planning (Lindstrom, Samuelsson, & Hagerfors 2010). This stemmed from two factors. Senior management did not communicate to Information Systems staff what its RTO expectations were. Also, the IS staff had not developed detailed plans on how to engage the disaster recovery capabilities in the event the disruption of services. Rather, disaster recovery planning was limited to insuring that redundant equipment and systems were purchased for every new solution. For example, when the Credit Union implemented an SSL VPN solution with two-factor authentication for remote access, disaster recovery was only marginally considered. This remote access solution was deployed into the headquarters location. 6 The two solutions were sufficiently integrated and the solution was accepted by the users. The users use a crypto-key device as well as a password to get remote access to internal systems resources from remote locations on the public Internet. When the equipment for the solution was purchased, gear for both the headquarters location and the disaster recovery was purchased, but only the gear at the headquarters location was integrated and tested by the users. The disaster recovery gear was installed at the disaster recovery location, but it was never integrated. So the Credit Union had enough commitment to the idea of disaster recovery to purchase a second set of equipment for disaster recovery, but not enough to thoroughly analyze exactly how to use this gear in the event of a disaster. This is important because without knowing how the gear will be used in the event of a disaster, it could take a considerable amount of engineering resources to put the gear into production in the disaster recovery site. It is expected that engineering resources would likely be very limited during a disaster situation. The purpose of this gear is to provide remote accesses to the information system resources in the event that the remote access solution deployed to the headquarter site is unavailable. This objective is much more likely to be effective if the analysis on how the gear is to be used was done prior to a disaster event. It is even more likely to be effective if the solution is deployed ahead of time with at least a significant portion of the integration work done before a disaster event. This is just one example of one solution where the Credit Union purchased additional systems for disaster recovery purposes without understanding how they 7 would be used. There are many more examples, but this one example is indicative of the disaster recovery preparation made by the Credit Union. So, while the Credit Union could satisfy the question that it engaged in disaster recovery efforts, the disaster recovery preparedness was not very mature, and the ability to effectively restore services in the event of a disaster was questionable. This is even though the Credit Union devoted substantial financial resources to the purchase and maintenance of redundant systems in support of its disaster readiness activities. Recently, the Credit Union executive management has recognized a need to improve its preparedness and put systems into production with disaster recovery as a primary feature consideration. One catalyst for this commitment to preparedness was an increased awareness of the potential for the loss of access to the information systems resources provided by the headquarters data center facility, which is located in a 100-year floodplain with an aging and ineffective levy system (Cabanatuan, 1998). This raised concerns of a fairly probable potential flood event and its impact on the ability of the Credit Union to provide services to its members. Credit Unions recognize that they have contractual obligations to provide services whether or not they have the physical capability to do so (Totty, 2006). It is with this commitment to disaster recovery capabilities that the management team considered an overhaul of the Credit Union’s aging core system platform. The Credit Union’s core applications are hosted by the Episys® solution from Symitar™, a Jack Henry & Associates Company. This solution is considered an industry leading solution with 30 percent market share of Credit Unions with more than $1 billion of 8 assets (Core Solutions, 2011). This core host platform was a seven-year-old IBM AIX host system that was scheduled for a hardware refresh in 2010. The Credit Union executive management decided that it would seek to continue to rely on the Episys® processing solutions, but that it would look for a solution that would meet the requirements of providing continuity of the system in the event of a disaster. The Episys® solution provides disaster recovery capabilities with rapid recovery and continuous protection through the implementation of a secondary host. Management went to the Board of Directors and obtained approval to proceed with a failover solution project for the Episys® core processing system. 9 Chapter 3 PROJECT METHODOLOGY This project uses the case study methodology to examine the use of advanced data networking technologies to the strategic disaster recovery goals of a financial institution. A case study is useful when an in-depth investigation of a problem is needed (Yin, 2002). Since this project focus is on detailed application of technologies, the case study is appropriate for analysis of the critical success factors for the application of these technologies. It also explains in detail exactly how these technologies can be employed and what the likely outcomes of these applications might be. Other methodologies can hide these details that may be of particular interest (Stake, 1995). While this case study allows a good look at the details of the application of these data network technologies, this case study does have some limitation. A limitation of concern is that this case study is limited to examining a single organization. This raises some issues as to whether the findings of the case study can be universally understood by organizations in general. Nevertheless, this case will provide the practitioner a better insight into the expected results of the application of these technologies to similar problems for other organizations. 10 Chapter 4 PROJECT PLANNING A primary requirement of the failover solution was that the secondary host needed to be physically located in the Credit Union’s existing disaster recovery data center facility so that it would be reasonably isolated from most anticipated disaster scenarios. This presented the first networking engineering discovery issue for the proposed solution. There are two basic methods for providing high availability of the Episys host. The first method relies on naming services. Using this method, client computers connect to the Episys host using a host name and rely on naming services to resolve the host name to an actual IP address. The secondary host is kept up-to-date and is free to have any IP address following conventional IP addressing schemes. In the event that the primary host is unavailable, the name record for the primary host can be modified to refer to the IP address of the secondary host as part of the recovery process. This is generally handled by employing private DNS servers with the TTL set to a very low value. This requires client computers to update their naming cache from the DNS server on a frequent basis. This helps accomplish the goals of high availability and help the organization met its RTO. By doing this, the client computers can quickly access the secondary host in the event of a disaster as though it were the primary host and operations can continue even though the primary host is unavailable due to a disaster. This method is supported by Jack Henry Associated, and it has the advantage of that it could be employed without much change to the existing data communications network infrastructure and supporting technologies. It is also a widely used method for 11 proving active/passive clusters and is mature; however, it does require than all systems that access the Episys host system use host names and currently client computers, including third party systems were configured to access the Episys host using its actual IP address instead of its host name. An alternate method of providing high availability of the core host does not rely on naming services but it requires that both the primary and secondary hosts are directly connected to the same network segment. This method allows clients to refer to the core host using its actual IP address instead of using a host name. In the event that the primary host is not available, the secondary host assumes the IP address of the primary host and the goal of high availability is met. This method is also supported by Jack Henry &Associates, and it allows the goals of high availability to be met without modification of the clients and more importantly without modification of third-party applications that access the core host. This was especially important to upper management since modifying all of the third-party client application to use host names instead of IP addresses would require more resources than were available to this project. A high availability system where the cluster spans multiple sites is a cornerstone of disaster recovery (Lumpp, Schneider, and Mueller, 2008). Even though the Systems Administration group was primarily responsible for the host system installations, the Network Engineering group was consulted early on in the discovery phase of the host system installation project. During this phase of the project various aspect of the network configuration for the host systems was discussed, and the feasibility of the designed configuration of the networking for the various logical 12 partitions was determined. This was jointly discussed by the Jack Henry Presales Engineering group and the Credit Union Network Engineering group. It was the desire of the Credit Union to be able to support VLAN trunks on the physical network interfaces used by the IBM PowerVM™ systems and map VLANs to the virtual network interfaces created for the logical partition. This would allow the flexibility to assign any VLAN to any logical partition on the systems and still maintain the resilience that link aggregation provides. This would be useful to provide support of Production, Staging, Development, and Training logical partitions using overlapping IP addresses spaces, each separated in a single networking infrastructure using VRFs; however, the Credit Union learned that even though this configuration is supported by PowerVM™, it was not supported by the Jack Henry& Associates support group. Jack Henry& Associates would support link aggregation but they would not support VLAN tagging. Because of these, the Credit Union specified that the PowerVM™ host system be configured with two multiport add-in network interface adapters in addition to the four LAN-On-Motherboard (LOM) interfaces included on the system board of the PowerVM™ host. The supported configuration was that each LOM could be assigned to multiple logical partitions, but an entire add-in card had to be assigned to one and only one logical partition. Within this limitation Network Engineering was able to determine a network configuration that supported all required logical partitions, but some active roles would need to be distributed between the two hosts. The networking for each logical partition was classified into six VLANs, including the production network, the management 13 network, the training network, the development network, the staging network, and the release testing network. The production network and the management network must be supported on both hosts. This left enough support for two more VLANs on each host, so the remaining four non-production roles were split between the two hosts. This configuration allowed an add-in network interface card to be dedicated to the active production logical partition that would be used to support core system processing. The other add-in interface was dedication to the VIOS logical partition. Some other production logical partitions included a vaulting partition and a legacy production testing partition. Other project planning and discovery considerations for the host system included issues such as systems sizing, storage requirements, and memory allocations. Once these designs were agreed upon and funding for the project was approved by the Credit Union Board of Directors and Supervisory Committee, agreements were signed and the installation was scheduled. In the meantime network Engineering was tasked with designing and implementing a bridged network to support the primary and secondary hosts installed in the primary and secondary data centers. During the initial planning phase for the project, it was determined that a network segmentation phase would be the first phase of the project. This would be followed by a multisite network bridge segment phase. These first two phases would be directed by the Network Engineering department. The last two phases would be directed by the Systems Administration Department. The last phase would be the systems integration implementation phase followed by failover testing. All phases of the project management 14 activities were coordinated by the Director of Information Systems. This provided a level of organization support that is needed for multi-departmental projects (Gray and Larson, 2008). 15 Chapter 5 NETWORK SEGMENTATION PHASE Before the bridge environment could be designed and implemented, the Network Engineering department undertook a Segmentation phase of the Host Systems Disaster Recovery Project. While the planning for this phase was included in the over project planning phase, this phase had its own analysis, design, implementation, and test phases. This phase also had its own support requirements. This phase deserved special attention because of the considerable organizational and operational risks associated with it. This phase had increased organizational risk associated with it because it required the coordination of multiple organizational departments, each with their own agendas and operational priorities. This phase was also considered to be an operationally complex project since it impacted every information systems solution used by the Credit Union. Project success with organizationally and operational complexity is improved with the risk management techniques associated with formal project planning (Yeo & Ren, 2008). Network Segmentation Analysis Network Engineering was tasked with finding ways to implement data networking infrastructures that would allow the primary and secondary core host systems to be located in geographically dispersed locations while still being connected to the same network segment. The Credit Union already had a secondary data center established with a high speed routed WAN connection to the main site. This data connection between the main site and the secondary site is a one-gigabit Metro-Ethernet connection provided by a regional telecommunications provider. 16 The first challenge that the Credit Union faced was its existing IP address scheme. Each site was segmented into a single class-B IP network, including the headquarters site. The headquarters site’s class-B segment supports over 500 nodes on a single broadcast domain. This is the same segment that that supports the core host that needs to be bridged to the recovery site. The concern was that if the entire segment were to be bridged to the secondary site, and considerable portion of the WAN link would be consumed by the broadcasts associated with having 500 nodes on a single broadcast domain. This latency is associated with a high number of nodes in a single broadcast domain can be reduced by separating Ethernet segments into separate VLANs (Heaton, 2000). The engineering staff sought the advice of the networking staff of another financial institution that was experienced with bridging core network traffic. This expert made a recommendation of that bridging such a large single broadcast domain would generate far too much broadcast overhead on the WAN link connecting the two sites. Prior experience was that not correctly choosing which network traffic to bridge was a particular pain point for multisite bridged segment. It was recommended that the Credit Union establish a new network segment just for the bridged traffic and that the core host system be configured to reside in that new dedicated bridged segment. This would keep the amount of broadcast traffic sent across the WAN link to a minimum and have a much lower impact on the availability of WAN bandwidth between the two sites. The Credit Union management considered this recommendation to be infeasible, since moving the core host system to a new dedicated bridge segment would require a change in the core host system’s primary IP address. The number of systems that had 17 hard coded references to this IP address made changing the IP infeasible for the same reason that changing to named references to the host was considered infeasible. It was also understood that excessive broadcasts going across the WAN link would not be acceptable. As an alternative compromise solution, it was proposed that the existing class-B segment at the headquarters site be segmented into smaller segments to support a bridged segment with the existing network IPs remaining configured on each node device configuration. The workstations, servers, and network devices in the headquarters location were configured on the network switch infrastructure using VLAN 1 (the default VLAN). This alternative would segment this address space into smaller segments. Among these segments would be one or more segments dedicated to systems that require the capability of bridging to the disaster recovery site. It was expected that this would solve a number of problems. First, it would reduce the network broadcasts. By increasing the number of segments on the network and decreasing the size of each segment, broadcasts would interrupt the network communications of fewer systems. This would leave more network bandwidth available to do useful work. And while this will increase the amount of traffic that now needs to be routed, it was anticipated that the benefit of reduced broadcasts would outweigh this drawback. Furthermore, it would offer the ability for the Credit Union to be selective as to which traffic to bridge to the secondary site I would allow a subset of the current network to be bridged to the disaster recovery site. The disaster recovery plan for the host system relies on bridging, which 18 would allow network systems installed at the disaster recovery location to use IP addresses used by systems at the headquarters site. This was required because the disaster recovery plan for the host system required that the backup system would assume the IP address and hardware MAC address of the primary host system in the event of a disaster. Since bridging sends all broadcast traffic across the bridged WAN link, the Credit Union needed to be very careful with what traffic gets bridged, as this bridged traffic can use a significant amount of the WAN bandwidth. Bridge traffic happens for an entire network segment, so by creating smaller segments, the Credit Union could be more selective in which traffic got bridged. Lastly, creating additional network segments would lend to better security controls on the network. While the Credit Union had assigned IP addresses to devices in logical ranges that can be used for network access lists, this control would be much more flexible when done on subnet boundaries instead of simple IP ranges. The timeliest of these issues was the bridging issue. This is because of the purchase of the new virtualized host system that would be installed within just a few months. To avoid costly redesigns of the new virtualized host system, the bridging and subnet project needed to be completed before the new virtualized host system was installed. Management decided that the headquarters network segmentation activities needed to be completed by two months prior to the scheduled install date of the new virtualized host system activities. This would allow two weeks for disaster recovery bridging activities to be the middle of the month prior to the new host installation. This 19 would allow the virtualization of the AIX environment to proceed without a costly network redesign after the new core host system implementation. It is important to note that the Credit Union planned to use a cutover strategy instead of a migration strategy for this segmentation. In a migration strategy, new parallel network segments would be established and then the Credit Union would move the systems from the existing large segment to the newer smaller segments. Normally this could be supported when applications reply on host names instead of the IP addresses. And while having applications rely on names instead of IP addresses is definitely a best practice, there is widespread reliance on IP addresses by many applications. It was not considered feasible to remove these applications’ reliance in the timeframe required by the business. The migration strategy was the recommended strategy by the consulted networking staff from the financial institution that had previously completed a similar project. But because of the special circumstances for the Credit Union the additional complexities and risks associated with a cutover strategy would have to be overcome with a choreographed effort during the segmentation activities. Nevertheless, the experience of the expert helped shape the segmentation analysis. So, instead of a migration strategy, the Credit Union decided on a cutover strategy. A cutover strategy would allow the systems to maintain their already configured IP addresses. And while significant reconfiguration of the campus switching equipment would be required; the updates to each individual node’s network interface would be trivial. Only the subnet mask and gateway information would need to be 20 modified on each individual system to support the smaller subnets. However, immediately after the headquarters campus switches were updated, all of the systems and network devices would need to be immediately reconfigured to be available on the network. During the time it takes to perform these activities, none of the data communications for the entire campus would be functional. This made the cutover strategy highly disruptive, thus the operational risks for this phase was particular high. Simplifying the cutover as much as possible was considered to be the key to making the cutover as non-disruptive and as successful as possible. This meant reducing the scope of the changes to as few changes as possible while still meeting the requirements for the bridging activities. These meant that certain changes will be tabled for future iteration of subnet segmentation project phases. By reducing the number of changes the impact of these changes could be reduced and the post cutover tests could be more effective. Also, reducing the number of changes would lead to fewer unanticipated problems that needed to be addressed after the cutover. This would also help manage the disruption. Finally, the experience that would be gained during this first iteration will make future cutovers even less disruptive. While making these changes in multiple phases would ultimately require multiple touches to the same systems and would increase the overall time required to complete the project, it would ultimately reduce the disruption of the planned cutovers. Since the new host system being installed was the most time critical driver for the segmentation project, the first phase of the project would be aimed at satisfying the requirements of the bridging project that depended on it. It was considered that the best 21 way to simply the first phase of the project was to reduce the number of segments to support the bridging project. At minimum, to support the bridging project, the AIX host system segment needed to be separated from the rest for the network segments. The network segments detailed in Table1 would be created for the first phase of the LAN segmentation project. These are not the actual IP addresses used by the Credit Union and are for illustration purposes only. The VLAN identifiers are what are implemented in the layers two switching devices to separate network segments into separate single broadcast domains. Then the IP address subnets are aligned to these VLAN identifiers. The IP subnets are defined by a Network IP address and subnet mask. Because the time constraints for the segmentation and multisite bridge segment phases were non-negotiable; classic waterfall methodologies did not appear to be appropriate for the phases. Instead, the team used time boxing techniques associated with the agile methodologies such as SCRUM. This meant doing only what was required. No more and no less is included (Milunsky, 2009). This scope management minimizes project timeline risk, but also helps manage the operational risk associated with this phase of the project. 22 Table1 Headquarters Network Segments VLAN Description: Network IP & Subnet Mask: Usable IP Address Range: ID: VLAN 1 Routers and Firewalls 143.16.1.0 255.255.255.0 143.16.1.1 – 143.16.1.254 VLAN 2 Switches 143.16.2.0 255.255.255.0 143.16.1.1 – 143.16.2.254 VLAN 3 Servers 143.16.3.0 255.255.255.0 143.16.3.1 – 143.16.3.254 VLAN 204 Bridged AIX Systems 143.16.4.0 255.255.255.0 143.16.4.1 – 143.16.4.254 VLAN 5 Staging & Dev Servers 143.16.5.0 255.255.255.0 143.16.5.1 – 143.16.5.254 VLAN 6 **** Workstations 143.16.6.0 255.255.255.0 143.16.6.1 – 143.16.6.254 VLAN 8 Virtualization Servers 143.16.8.0 255.255.255.0 143.16.8.1 – 143.16.8.254 VLAN 10 Printers 143.16.10.0 255.255.255.0 143.16.10.1 – 143.16.10.254 VLAN 20 Terminal Servers 143.16.20.0 255.255.255.0 143.16.20.1 – 143.16.20.254 VLAN 128 Workstations 143.16.128.0 255.255.240.0 143.16.128.1 – 143.16.143.254 VLAN 203 Bridge Servers 143.16.203.0 255.255.255.0 143.16.203.1 – 143.16.203.254 VLAN 205 Bridged Staging & Dev Servers 143.16.205.0 255.255.255.0 143.16.205.1 – 143.16.205.254 Because of this, segmenting the workstations into separate network segments based on department and function would be postponed for a future network segmentation project. This is because all of the workstations are supported by closet switches with more than 700 network ports. Determining the proper VLAN ID to assign to each of these closet switches for the appropriate departmental VLAN would require more resources than was available in the time period allowed for the network segmentation activities. Also, it was considered that securing each workstation port to departmental VLAN would be prone to error. Furthermore, it was a required in order to meet the 23 objectives for bridging the network traffic to support the failover capabilities of the new core host system. Not including this level of segmentation helped reduce the complexity of an already disruptive project. It would have been simpler still to combine VLAN 10, VLAN 20 and VLAN 128 into one network segment, but because of the mathematics of IP subnet mask calculations, this was not possible. Another alternative would be to assign new IP addresses to the printers so that they were in the VLAN 128 range. This would make the segmentation project easier to implement, since the printers are connected to the same switches as the workstations and we would not have to support multiple VLANs on the closet switches. This would go a long way to improve the chances of success for the cutover, but the time to implement new IP addresses on the printers would have delayed the project progression and put the target completion date at risk. Another consideration was the option to use a secondary IP addresses on the VLAN 128 on the switch. This would avoid the need to configure the individual ports on the closet switches, since this configuration could support multiple subnet ranges on a single VLAN. However, since the 10.13.6.0, 10.13.10.0 and the 10.13.128 IP ranges all reside in the workstation space of the network, the switches would need to support two secondary IP addresses for a VLAN and be able to support the layer three functions with both secondary IP addresses. While the Credit Union network engineering staff was able to confirm that this is supported with a single additional secondary IP address on a VLAN interface, they were unable to confirm that it is supported with two secondary interfaces. This configuration was considered somewhat experimental and while it would 24 potential decrease the complexity of the implementation, the inherent risks associated with the unconfirmed support for the solution seemed to offset the simplification the approach might offer. In any case, most workstations would be initially configured to be all on the same network segment. While this would not do much as it could to reduce the broadcast traffic on the workstation segment, it did meet the requirement of reducing network broadcast on the bridge segment. This approach did not solve all of the problems that segmenting has the potential to solve, this approach did not preclude the availability of applying more comprehensive segmentation approaches in future segmentation projects. This approach did stabilize the network configuration for the AIX host systems and provide the Credit Union networking and system administration teams with experience to make future segmentations project less disruptive. One thing that stands out in the VLAN scheme is that there is a discontinuity in the relationship between the third octet of the IP address and the VLAN identifier for VLAN 204. With every other VLAN identifier, there is a direct relationship between the third octet of the IP address and the VLAN identifier; however this relationship is broken for VLAN 204. This is because all of the bridged VLANs are being installed in the 2XX range and the non-bridged VLANS fall below that range. The logical thing to do would be to have the IP range of VLAN 204 of 143.16.204.0/16; however, the AIX system requires that that it be installed on a bridged network segment and changing the IP address of the AIX system from 143.16.4.1 to something in this range would not be feasible in the time allowed for the project. So, the pattern would be disjointed for this 25 segment. This was not preferable, and while this might cause confusion for future staff, it was considered out of scope to address this issue. This helped reduce the risks associated with this phase of the project. Network Segmentation Design The activities involved in created the needed network segmentation to support the multisite bridged network include both OSI layer two designs and layer three designs. The layer two designs were considered before the layer three design issues. Layer Two Design Issues One of the first considerations for the headquarters network segmentation activities was the creation of documentation of the physical network switch infrastructure so that new switch configurations could be designed to support the segmentation. This required that the IP address of each node attached to each particular network port be documented so that the appropriate VLAN identifier could be assigned to each switch port. This needed to be documented for well over 1000 switch ports deployed in the headquarters network. Since the switches operate at layer two, the switches only report hardware MAC addresses and cannot directly report IP addresses. In order to get documentation of hardware addresses with associated IP addresses, a correlation between the information available from layer two devices and layer three devices could be performed to produce the required documentation. This can be a tedious and error prone activity to perform manually with the number of ports installed in the headquarters location, but there are some automated tools to help produce the required documentation. The Credit Union 26 used the Switch Port Mapper tool from the Solarwinds Engineer’s Toolset to help generate the required documentation. In some cases, because existing ACLs implemented on the network gear will not allow the automatic collection of layer three addresses to the core router, some devices still needed to be traced manually in order to generate the needed physical documentation. This documentation was then used to prepare a plan to reconfigure the network switch so that VLAN tagging is done at the switch port level. In this design, the node does not provide the VLAN tagging, rather, the switch added the VLAN tag to each packet as its traverses through the switch’s fabric. The alternative would be to have each node provide the VLAN tagging, which would be a decentralized approach to the issue. The Credit Union preferred the centralized approach to VLAN tagging because it did not require additional configuration for each node. The Credit Union has a mix of Cisco and HP switches deployed in the headquarters location. The Credit Union has deployed Cisco 3750G switches in support of its data center and HP 4000M switches in support of the closet switches. The Cisco switches do this using the switchport mode access and the switchport access VLAN x commands on a per port basis. The HP switches do this by using the untagged command to tag untagged traffic to a particular VLAN ID. The key piece of documentation that was needed to support this configuration was the IP address of each system connected to each port on the switch. This physical switch documentation was then used to design the switch configurations. 27 While preparing this documentation of the switch ports, it was found that a good number of unmanaged desktop switches had been deployed by the desktop support staff. This was done where additional nodes needed to be installed and there was not enough structured cabling available to support additional nodes. Instead of installing additional structured cabling, inexpensive unmanaged desktop switches where previously deployed into these areas so that the existing structured cabling could support additional nodes. This was an inexpensive and expedient way to support node growth, but presented a problem for the segmentation activities. It was assumed that there was a single node per switch port, but the existence of these unmanaged desktop switches invalidated this assumption. This is problematic because the VLAN assigned on a per port basis only works if all of the devices connected on the ports will be on the same network segment defined by the VLAN identifier assigned to each port on the managed switches. There were several available solutions to this issue. One solution was to configure each of the nodes attached to the unmanned switches to provide the VLAN tagging, but this alternative would require some nodes to be configured differently from most other nodes. It was felt that creating an exception to the new standard node configuration would cause a support issue in the future. Another option would have been to remove the unmanaged switch devices and replace them with managed switch devices so that the VLAN tagging could be set at the leaf level. This had a couple of drawbacks in that it would be costly and that it would add the additional administrative overhead of managing network devices that are not located in a secured data closet. 28 A third option that the Credit Union considered was running additional network cabling to the locations of the unmanaged desktop switches. It was determined that there were enough available ports in the HP 4000M closet switches to support the nodes. This option had a lower cost than the switch replacement option and maintained centralized administration of the desktop switching infrastructure. After Network Engineering discussed these issues with the desktop support staff, it was agreed that unmanaged desktop switches would be removed for any switch that was supporting nodes that needed to be configured in different network segments; however, any unmanaged desktop switches that supported nodes where all of the nodes would share the same network segment could remain. Network Engineering helped identify the switches that would need to be removed and worked with the desktop support staff to install additional structured cabling to support the removal of the incompatible unmanaged desktop switches. Once this was completed, Network Engineering went through several phases of updating the switch map documentation and evaluating the documentation for compatibility with switch port VLAN tagging. Once this documentation was accepted, it was used to prepare configuration scripts to reconfigure the individual ports on the Cisco switches. But since the HP switches do not support scripted configuration, these configurations were developed in a lab environment and saved to files that could later be transferred to the HP switches using TFTP. 29 Layer Three Design Issues The next consideration was the design of the layer three functions. The Cisco 3750G switches at the network core was capable of performing the campus routing functions. New gateway IP addresses would be assigned to the management interface for each VLAN interface added to the core switch stack. For example, the management interface for VLAN 204 on the core router will be assigned an IP address of 143.16.4.254 with a subnet mask of 255.255.255.0. This management interface would then become the gateway address of all of the hosts on VLAN 204. IP routing would be enabled in the configuration of the core switch. A remaining switch configuration design issue is the design of the route propagation. There were a few main choices available. Static routes could be added to the core headquarters router. As long as static route re-propagation were turned on for EIGRP routing on this layer three switch, this would take care of propagating the routes to the rest of the network routing infrastructures such as the WAN routers. The other option was to use the EIGRP stub routing support on the Base IP software on the Cisco 3750G core switch stack. Using this feature may be important for future projects when multiple paths become available to the LAN segments, but as of the time of these design activities there were no current requirements that would leverage this configuration approach. In order to main simplicity for the cutover it makes the most sense to use the first ion of re-propagated static routes instead of stub EIGRP route advertisement. To simplify the issue, the Credit Union decided to use advanced IP services on the 3750G switch and eliminate the existing 143.16.1.1 core router as part of the cutover. 30 The 3750G switch stack would assume the 143.16.1.1 IP address and all of the ACLs and static routes would be migrated to the layer three functions of the switch stack. This stack would also be responsible for propagating EIGRP routes to the WAN routers. The core layer-three switch device would also be responsible for routing to the web café DMZ, which is the DMZ used to provide the branch office kiosk access to the Internet. This access is used by members visiting branches that need access to internet services offered by the Credit Union. This kiosk access to the Internet provides members access to Internet Services offered by the Credit Union even if the member does not own a computer or have access to Internet connectivity. This service helps the Credit Union meet its strategic goals of increasing member usage of online services. Network Engineering planned to move this function to the switch stack as well by implementing a port with the no switchport option so that a VLAN does not need to be defined for this DMZ in the core switching environment. There are some servers which are configured with local static routes for communications with systems which are generally not accessible to most systems. For example, several hosts have static routes to send traffic to and from particular DMZs. These static routes will not function when the access points are no longer on the same physical segment. Because of this, these static routes will be documented, and new routes will be implemented in the routers and firewalls. This would simplify documentation and future examinations of the routing environment, making it more predictable and supportable. 31 Once the configurations for the routers were completed, the Credit Union needed to consider the configurations for the various firewall devices. Static routes for the new LAN segments would need to be added. The firewall devices provide unrouted access to all of the network devices at the headquarters LAN segment, but since the firewall devices were not configured to use an internal default gateway, static routes would need to be added to these devices in order for communication to be maintained. There may be some network segments that will not have hosts need access to the firewalls, but for the sake of simplicity and reliability, static routes would be added to each firewall’s configuration. These design considerations helped manage the project risk associated with this phase. There was some overlap in the existing VLAN identifiers used for the fenced-off staging and development networks. For example, the intuitive VLAN ID for the 143.16.10.0 network is VLAN 10. This is intuitive because the third octet of the IP address is the most significant in identifying this segment. However, this VLAN was already in use by the fenced off staging network for one the Credit Union’s staging and development DMZ segments. The design choice was to either use less intuitive VLAN identifier for the new internal production segments or to change the VLAN identifiers used for these fenced off networks. It was felt that, in the long run, the benefit of using intuitive VLAN IDs would likely outweigh the effort to assign new VAN IDs to the fenced off network. This is because the effort to change the fenced off network was fairly minimal, so this change was made during the design phase for the segmentation activities. 32 Once all of these design issues where analyzed, designs for the implementation scripts for the switches, routers, and infrastructure servers were developed. Developing these changed scripts ahead of time reduced the chances of errors during implementation and also would shorten an already prolonged change window. Network Segmentation Implementation The headquarters site network segmentation activities were divided into two phases. The first phase included all of the pre-cutover activities. These tasks were completed in advance of the actual cutover and where basically all tasks that could be done outside of the actual cut-over date. Table 2 details the activities that were performed during this phase of the implementation of network segmentation. These tasks were completed on time to move forward with the next phase of the implementation, which was the actual cut-over. Using these Time Boxing techniques borrowed from agile development methodologies help reduce risk in the implementation cutover. Memorial Day weekend was chosen as the cutover weekend for a number of reasons. Completing the cutover on the long weekend allows two days for the entire process. That gives Saturday night and Sunday to perform the cutover. It allows the implementers to get some rest before business unit testing is performed on Monday when the branches will be closed for the holiday. Allowing the implementers to rest before business unit testing is performed was considered important because it was anticipated that the engineers may need rested minds in order to respond to issues that the business units may discover during their unit testing. If business units began their testing on 33 Sunday evening after the cutover was completed, the engineers wouldn’t have rested for more than two days and would not have access to their full capabilities. Table 2 Network Segmentation Pre-Cutover Tasks 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. Move staging virtual to internal cluster hosts Retire fenced the staging fenced-off network segments switching infrastructure Establish new VLANs for the staging fenced-off network segments Document current persistent routes on all servers Schedule cutover activities Document current router and switch configurations Modify routing scheme for Exchange Client Access/Hub Transport servers Establish new virtual machine port groups on internal ESX servers Document Static Routes of the episys and epibu hosts Contact JHA to arrange for on-call Support the night of our network cutover Update public web pages to advertise the system outage to members Create checklists for workstation and printer reconfiguration and testing Remove VLAN conflicts from IDF cabling Verify IDF Configuration is free of VLAN conflicts Establish new DHCP scopes, including workstation reservations Verify IP addresses for all servers on active server documentation Design and code routers, firewalls (PIX), and core switch configurations Verify that only VLAN 6, VLAN 10, VLAN 20and VLAN 128 on IDF switches Change three workstation IPs that conflict with new gateway IPs Document original physical networking infrastructure Consolidate HP 5300XL core switch into the Cisco 3750 switch stack Remove all persistent routes from all internal servers Make final router changes with all the last minute decommissions done this week Resolve static routes on Epibu and Episys Design and code new IDF switch configurations Remove Static Routes from ISA servers Document new physical networking infrastructure Convert all printers IP configurations from static to DHCP reservations Export passwords from password server Review and document server shutdown and boot up order Get power down and power up steps from JHA Retire current HP 5300XL core switches Server Decommissions 34 34. Create Network Engineering Critical Application Test Plan 35. Create checklists for server reconfiguration and testing 36. Stage new network device configurations on routers, firewall, and switches Memorial Day weekend was the only long weekend before the core host system would be installed in July. This weekend also leave the month of June to complete the segmentation of the failover site, which was also an unsegmented class-b IP address space, and to complete the bridge segment implementation between the two sites. The Credit Union management considered it extremely important that that the pre-cutover activities needed to be completed on time. This is because the segmentation activities needed to be completed before the bridge network could be established for the installation of the failover capabilities of the core hosts system in July. This is because there were several Jack Henry & Associates resources that would not be able to be rescheduled, so the July install date for the failover host could not be rescheduled. This was complicated by the fact that there were several groups within the IT organization of the Credit Union that needed to coordinate to make sure that all of these activities were completed so that the cutover could be completed on the Memorial Day weekend. There was a lot of potential for the project to fail from the risk associated with the amount of interdepartmental coordination required for the activities to be completed on time. Often, one department’s priorities are not the same as another department’s priorities. However, upper management did a lot to emphasize the importance of this project to the various department managers. This helped make sure that that the activities being driven out of the Network Engineering group were treated as top priorities by other 35 departments. It is likely that the project’s success would have been put in serious jeopardy if upper management was not so instrumental in communicating the strategic importance of these activities and how important it was that the activities be completed on time so that the cutover could be performed over the Memorial Day weekend. It is widely recognized that senior executive support is key factor in reducing project risk (Liu, Zhang, Keil, and Chen, 2010). The Credit Union was successful in getting all of the pre cutover activities completed before the long weekend cutover. The cut-over needed to be completed as quickly as possible because the cutover was highly disruptive. Online member services would not be available while the Credit Union reconfigured the network. Table 3 shows the activities that were planned to be performed during the cutover. A detailed checklist for each of these activities was also created. Table 3 Network Segmentation Cutover Schedule Network Segmentation Cutover Schedule Start Time: 11 pm Saturday, May 29th, 2010 Start Task Time Mins Group 1 Preparation and Setup 2 Update the configuration of 10.13.20.0 devices 3 23:00 60 NE Disable all alerts Milestone – Start NE 5 Reconfigure internal Checkpoint firewalls using SV-RR-SMC-01 0:00 0:00 6 Power off all DMZ virtual machines 0:15 7 Power down MCW servers 0:30 8 DMZ Cluster ESX service console and kernel reconfiguration 0:30 30 NE 9 Reboot DMZ cluster hosts 1:00 15 NE 4 15 NE 15 NE SA 36 10 Power on and test public web servers using console commands Milestone – public web server WWW back up [down 1 hr 15 min] 1:15 15 NE 12 Power off all Internal virtual machines 1:30 1:30 30 NE 13 Internal Cluster ESX service console and kernel reconfiguration 2:00 60 NE 14 Power off all Internal ESX hosts 3:00 15 NE 15 Reconfigure the IP Configuration of EPISYS and EPIBU 3:15 30 SA 16 Power off and disconnect the 10.13.1.1 core router 3:45 15 NE 17 Update the configurations on the IDF switches 4:00 30 NE 18 Update the configurations of Closet UPS devices 4:30 15 NE 19 Update IOS Image on 3750G Switch stack 4:45 30 NE 20 Activate new startup configuration on 3750G switch stack 5:15 15 NE 21 Activate changes on remaining routers and firewalls 5:30 60 NE 15 NE 15 NE 11 Milestone – Episys systems back up for [down 7 hr] 23 Re-cable DMZ5 port on the core router to the 3750G stack 6:30 6:30 24 Re-cable SAN 3750E SAN switch stack directly to the core stack 6:45 25 Test 3rd Party Vendor Apps (Coop, FSCC, Visa DPS, OFX) 7:00 26 Activate Changes on 3750E SAN switch stack 7:00 15 NE 27 Reconfigure VC, reboot & test management of DMZ ESX Hosts 7:15 30 NE 28 Reconfigure SV-RR-DS-01, reboot and test connectivity 7:45 15 NE 29 Power up remaining DMZ virtual machines 8:00 15 NE 30 e-services to begin Testing 8:15 31 Power on Internal ESX Hosts 8:15 15 NE 32 Reconfigure networking for Active Directory (DS) virtual servers 8:30 15 NE 33 Reconfigure and test physical ISA proxy servers 8:45 30 NE 34 Activate new DHCP Scopes (finish building some scopes) 9:15 15 NE 22 Milestone – Service Desk begins reconfiguring desktops & printers SA MKT 36 Reconfigure and test all printers 10:00 10:00 120 SD 37 Reconfigure and test desktops connectivity 10:00 240 SD 38 Reconfigure and test remaining physical servers 10:00 60 NE 39 Set new virtual machine port groups on internal virtual machines 11:00 90 NE 40 Power up remaining virtual machines 12:30 30 NE 41 Reconfigure all remaining virtual machines and power them off 13:00 120 NE 42 Power up all Domain Controllers / database servers 15:00 30 NE 35 43 Power up all remaining virtual servers 15:30 30 NE 60 NE 45 Test/Troubleshoot all Critical Server Functions and NetEng Tools 16:00 16:00 46 Test/Troubleshoot user desktop software & Service Desk Tools 16:00 60 SD 15 SA 44 Milestone - All Systems Up 47 Verify Symconnects ports 16:00 48 Milestone - Mission Accomplished 16:00 37 The cutover tasks were completed on time and within seventeen hours of beginning the cutover activities, the cutover was completed successfully. Post Segmentation Cutover Business Unit Testing The individual business units were notified of the comprehensive changes that were taking place over the Memorial Day weekend and they were asked to implement unit tests of all critical functions starting at 12 PM on Monday afternoon. Network Engineering and Systems Administration staffs were made available to address any issues discovered by the business units. These two groups conducted their own critical application tests starting at 9 AM the day after the cutover activities. Each business unit was made responsible for developing their own business unit testing plans. This way they could be sure that all systems that they needed to function as business units were available and working properly after the network segmentation cutover activities. This approach made each business unit responsible for know how to test their own systems and provided accountability to the unit that required the systems to perform their duties and deliver quality service the members of the Credit Union. Amazingly, all unit tests performed by the business units were positive and the information technology support staff did not need to fix anything after the cutover activities. This was due in part to very detailed planning of the cutover and the interdepartmental coordination made possible by the support and emphasis that upper management gave to the disaster recovery project. Careful risk management paid off. 38 Secondary Site Network Segmentation With the segmentation of the main headquarters site completed, it was time for the Credit Union Network Engineering staff to turn its attention to the segmentation of the secondary site. The failover site is located in a facility that shares a building with a branch operations location. All branches local networks are segmented using a class-B IP address space, and the failover facility was no exception to this. The scale of the segmentation activities for the disaster recovery site was much smaller and so was its potential impact. This is because there were not nearly as many nodes involved and unavailability of most of the nodes would not have created much impact on the core business systems used by the Credit Union. Because of this, the segmentation activities were planned to be performed on a Saturday afternoon after the branch was closed for the day. The experience gained during the large headquarters segmentation activities also made the preparation for the failover site cutover much easier. The documentation, design, and scripts for this cutover went very quickly and the segmentation of the disaster recovery site was completed without issue within two weeks of the main site segmentation. 39 Chapter 6 NETWORK BRIDGE PHASE With the Segmentation phase completed, the Network Engineering department moved on the Network Bridge Segment Phase. The project planning for this phase was completed during the overall project planning phase, so Network Engineering moved straight away into the analysis and design of this phase of the project. Network Bridging Scope Analysis and Design One of the requirements of the network bridge is that sensitive data be encrypted before they are transmitted across any networks that others may have access to. This includes the telecommunications companies that provide the data communication services for the Credit Union’s private WAN infrastructure. The Credit Union had previously encrypted only certain high profile data that was transmitted across the private WAN. The selection of which data was encrypted was based on Transport layer packet header information and ACLs. If the transport was one of the transports that the Credit Union considered sensitive, the traffic would be transmitted through an encrypted GRE tunnel; otherwise the traffic was transmitted through an unencrypted GRE tunnel. By selectively encrypting data across the private WAN, the Credit Union was able saved money on the WAN routers. Bridging network traffic across the private WAN raised some issues that made the Credit Union reevaluate the policy of selectively encrypting data on its private WAN links. Since bridge traffic is handled at the data frame level and would not be differentiated in the same way that they can be in a routed environment, the Credit Union 40 required that all bridged traffic going across vendor networks must be encrypted; however, the existing WAN routers that were installed in the headquarters and disaster recovery sites were not rated to encrypt 1-Gb/s at wire speed, so the Credit Union needed to replace these routers with router that were rated to encrypt the bridge traffic. The Credit Union decided to purchase Cisco ASR-1002 routers to replace the installed Cisco 3845 routers because it is rated to encrypt 1-Gb/s at wire speed. The network bridging project involves virtual wire, also known as pseudowire, functionality between the headquarters datacenter and the Disaster Recovery datacenter. This functionality allows for the same IP subnet to be utilized at both locations at the same time. Using the L2TPv3 Ethernet Pseudowire, enterprises can extend layer-2 circuits over their IP networks (Lewis, 2008). The datacenters would be connected over the 1-Gb/s primary WAN connection and 100-Mb/s secondary WAN connection. The secondary link would provide the redundancy for the virtual wire between the two datacenters. Having these two datacenters virtually connected will allow Credit Union to meet the following business requirements. First, the new host system failover method would move the active host’s primary IP address, currently located at headquarters site, to the secondary host located at the Disaster Recovery datacenter. This failover provides low administration overhead for workstations and servers that are connecting to the host system via the active host’s primary IP address. Second, the virtually connected networks would also allow for virtual servers to be “live migrated,” via VMware vMotion, over to the Secondary datacenter. This active 41 migration requires that the WAN connection be greater than 700Mb/s and have a latency of less than 5ms round trip time between the two datacenters. The preliminary tests for this connectivity are within the limits when utilizing the Primary 1-Gb/s metro Ethernet link. The 100 Mb/s secondary links would not suffice for a “live migration” and would be used to provide production data traffic without vMotion. It should be noted that this second requirement will not be required for the immediate goal of providing the host failover method; rather, the Credit Union included this requirement as a way to future proof the solution. The third requirement is the ability to effectively utilize the secondary datacenter Internet resources with servers that are physically located at headquarters site. Using Cisco’s implementation of Virtual Route Forwarders (VRFs) would allow the Credit Union to have VLANs located at the headquarters site that will route differently to external resources via the Internet connections at the secondary site. This technology allows multiple routing tables to be maintained without the use of multiple routing devices (Mattke, 2009). This functionality allows the IT staff to completely build a system at headquarters datacenter before it is physically moved to the secondary datacenter. This flexibility allows the staff to build secondary site systems while still being physically available for daily support tasks at the headquarters datacenter. This third requirement was considered optional. Network Engineering planned for the layer two and layer three functions for the disaster recovery site and headquarters site networks to be placed into VRFs. Each VRF has its own routing and forwarding table. The VRFs were located on the two core 42 switches and on the four WAN routers. Having multiple VRFs and multiple routing instances on each device provides the routing redundancy so if one datacenter was offline the subnets that are bridged to the remote datacenter would still functionally route via the other location. Having multiple pseudo-wire pairs, both on the WAN routers, provides redundancy for the bridged networks, so in the event that one WAN link pair was down, the other WAN router pair would bridge that network. Network Bridging Implementation The implementation of the network bridging was a far less challenging than the implementation of the network segmentation, even though the technologies used for the network bridging are more advanced and complex. This is because there were fewer devices to configure, fewer business units to coordinate, and the network downtime required to implement the change was much shorter. The plan was to implement one site at a time with the disaster recovery site being implemented first. The first set of routers was installed without network bridging enabled on the same evening that the disaster recovery segmentation activities were performed. This install went without major incident and the main site cutover to the new Cisco ASR1002 routers were scheduled for a following Wednesday evening after all of the branches were closed. This implementation would not impact online member services; rather, it would only impact communications between the primary and secondary sites. This allowed the implementation of the new WAN routers to be scheduled without a member-impacting 43 maintenance window. This provided a good amount of scheduling flexibility as long as the deadline of having bridging functioning in advance of the July installation of the new core host system with site failover capabilities. With the networks properly segmented and the required equipment installed, it was time to implement the network bridging between the two sites. This was scheduled for a Saturday evening and was to be performed by two engineers both working together at the primary site. There was some discussion as to whether one of the engineers should work from the primary site while another engineer would work from the secondary site. The thought was that if during the implementation of the network bridging, all network connectivity was lost to secondary site, the engineer located at the secondary site would be available to reestablish connectivity. If both engineers were located at the primary site and site-to-site connectivity were lost, this would require a 40-minute drive to the secondary site to manually reestablish network connectivity. In the end, the Credit Union decided that the improved communications between the engineers located at the same site outweighed the possibility of extending the maintenance window by having to drive to the secondary site. This worked well because the engineers never completely lost connectivity to the secondary site. This was achieved by manually switching between the primary and secondary WAN connections; however, there were issues that prevented the engineers from establishing a secure bridged segment. During the implementation of the bridge network it was discovered that there was a problem with the Cisco IOS code on the ASR 1002 routers that prevented the use of 44 tunnel mode protection on the GRE tunnels established in non-global VRFs. This combination was tested in a lab using Cisco 3845 routers, but this particular problem did not appear on the Cisco 3845 router equipment. The engineers could only get the GRE tunnels to stay stable on the ASR1002 routers by using unencrypted GRE tunnels. Because of this, the Credit Union decided that the implementation of the bridge network changes would be reverted until a solution to the encryption problem could be found. Within a couple of days, the engineers were able to identify a workaround to the tunnel protection mode on the GRE tunnels in non-global VRFs by removing keepalive statements from these GRE tunnels. Another implementation of the bridged network was scheduled and the bridge network was implemented. During this round it was found that the GRE tunnels were stable with encryption. It was also found that the keepalive statements were not needed. Bridge Network Stress Testing With the bridge network in place, the next step was to stress test the network bridge configuration. This involved collecting network analytics while placing very close to maximum expected loads on the bridge links. This test was performed using a pair of Windows 2008 R2 Servers. One of the servers was installed and configured on primary site’s bridge segment and the other server was installed and configured on secondary site’s bridge segment. Each server was configured with an array of six disks in RAID 0 configuration. 45 Each server was configured with an MTU of 1292 bytes, which is smaller than normally Ethernet frames because of the overhead of the protocols used to provide the encrypted bridge link. Table 4 shows the initial calculations used to determine a starting point of MTU for the routed traffic. Table 4 Non-Bridged Traffic MTU Calculations MTU 1500 IP Header2 -20 = 1480 IPSEC -58 = 1422 GRE -4 = 1418 Cisco recommended buffer -18 = 1400(This is what Tunnels should be configured with IP MTU) MSS Diff -40 = 1360 (This is the MSS value) Table 5 shows the initial configuration for a starting point for the MTU for bridged traffic. This is of particular importance because hosts will negotiate an appropriate MTU when TCP traffic is transmitted in a routed environment. This is not negotiated non-routed traffic and in a bridged network environment and is more likely to result in severe fragmentation. 46 Table 5 Bridged Segment MTU Calculations MTU 1360 IP Header2 -20 = 1340 GRE -4 = 1336 (This is the IP MTU of Tu0) 802.1q VLAN tag -4 = 1332 MSS Diff -40 = 1292 (This is the MSS value of Tu0) MTU size of devices on bridged networks Tests were designed to measure the total throughput across the bridge link to determine the amount of bandwidth and latency that the Bridge network solution could deliver. The primary objective of the test was make sure that the bridge network could deliver the 700 Mb/s of bandwidth at less than 15 msec of latency while transmitting performing bulk transfers. Another objective was to avoid fragmentation of the frames. The Credit Union found that unless the MTU of the test hosts was set to no higher than 1280 bytes, severe fragmentation occurred in the test, and performance was much less than it could be. After setting the MTU of the test hosts to 1280 bytes, it was found that the bridge network did meet the performance requirements of bandwidth and latency. It was determined that the Credit Union would need to configure this custom MTU on all hosts that would be installed on the bridge segment. This is not a typical configuration, and the method of configuring this varied widely between various appliance and host nodes. The Credit Union felt that this may led to frequent configuration errors and subsequent performance issues, so another alternative was sought. The alternative that seemed to make the most sense was to enable jumbo frame 47 sizes on the WAN interfaces. This would allow the WAN routers to accept larger frames and even with the overhead of the various protocols such as encrypted GRE tunnel and pseudo-wires, fragmentation of the frames would not be required since the WAN equipment could support frame size larger than 1500 bytes. This would allow the hosts on the bridge segment to avoid special configuration of the MTU and still maintained the required network performance. The drawback is this would require coordination with the data telecommunications vendors used by the Credit Union. After contacting the primary WAN vendor, it was determined that the vendor equipment installed at that time could not support jumbo frame sizes, but that the vendor was in the process of migrating to equipment that they felt would support jumbo frames. The WAN equipment used by the Credit Union (Cisco ASR 1002) did support jumbo frames. The WAN vendor tested their ability to support jumbo frames on their new infrastructure and they confirmed that the new equipment could support jumbo Ethernet frames as large 9218 bytes. The vendor agreed to make the required changes to their infrastructure, but the changes could not be done before the redundant host system installation. The Credit Union decided that it would proceed with the install of the redundant core host system with the fragmentation issue in place. It was acceptable to address this fragmentation issue after the installation of the primary and failover host because these future changes to the site-to-site bridge segment would not require the host system configuration to be re-engineered; the Credit Union felt that the network bridging 48 infrastructure was complete enough to proceed with the installation of the new hosts on time. 49 Chapter 7 SYSTEMS INTEGRATION IMPLEMENTATION The Credit Union’s Network Engineering staff worked with the implementation staff from Jack Henry& Associates to install and configure the IBM PowerVM™ AIX virtualization hosts and their various logical partitions. The plan was for the implementers to prepare the host systems during the last weekend in July and then to migrate production to the new host systems over the weekend in July. There were a total of four host systems to configure. There was an internal host and a DMZ host installed at each location, and each host supported six to eight logical partitions, including a hypervisor on each host system. There were also management stations to install at each site and fiber channel SAN infrastructures to install at each site. The plan was to install and configure the equipment installed at the primary site before moving on to the secondary site. Once the equipment was prepared, the migrations strategy was to perform a backup of the current system during the night maintenance batch jobs and restore to the new system onto the primary logical partition on the primary IBM PowerVM™ host system. During the migration batch processes, the staged system would swap IP configuration details with the in place system. After the migration process, the Credit Union would begin use of the system supported on a logical partition of the new PowerVM™ host. All of this would be done in the evening after the close of business Saturday night. This allowed Sunday for the business units to test their processes before the start of business on Monday morning. 50 The Jack Henry Implementation team ran into issues with the installation and configuration of the SAN infrastructure. After struggling with several reloads of the logical partitions, it was finally determined that the new SAN controller was defective. IBM replaced these controllers, but then the implementers found themselves significantly behind schedule. Rather than reschedule the migration for a later date, the implementers decided to work longer hours to catch up so that they could still complete the migration the last weekend in July. The implementers worked several days of very long hours, and they were able to get the systems ready to complete the migration of the core host systems. The Credit Union observed, and the work appeared to be fairly unstructured with little use of checklists or written procedures. This was more apparent during the preparation of the secondary site systems. This was considered acceptable because the logical partitions configured on the hosts installed at the secondary site would not be put into production during the last weekend of July; rather, they would just being prepared for later use. Through the efforts and long hours of Jack Henry Implementation staff, the systems were prepared in time to begin the migration Saturday evening. The migration was completed by late Sunday morning and initial testing of the Episys host proceeded on time. Online member services were reinstalled after testing by the marketing department. Then all of the various business units proceeded with their tests. It was determined that the migration was successful and the Credit Union proceeded to operating on the new logical partition. 51 Chapter 8 NETWORK BRIDGE TESTING REVISITED The primary Episys host was in place and working, but for the first phase in the disaster recovery system to be complete, the goal of failover to the secondary site must be met. As a prerequisite for this the accomplished the bridge network needed to meet the performance requirements, so the next step to develop the host failover capabilities was to enable Jumbo Ethernet frames on the WAN Segments between the primary and secondary site. It was felt that the Jumbo Ethernet frame support on the WAN equipment would allow the bridge network to provide adequate performance by preventing severe fragmentation of layer-two traffic as it is transmitted over the pseudo-wire and encrypted GRE tunnels. The Credit Union began working with the WAN vendor to allow support for Jumbo Ethernet frames at for both sites. The WAN vendor would need to move the Credit Union’s virtual private WAN switching environment from one infrastructure to another. The WAN vendor did not require any physical changes to the Gigabit and Fast Ethernet connected locations, but it did require changes for the equipment installed in the T-1 connected locations. The T-1 connected sites needed to migrate to channel MUX devices. Previously, the T-1 interfaces were installed in the Credit Union’s routers. In order to support the new infrastructure, the T-1s would be installed in channel MUX devices managed by the WAN vendor, and the handoff to the Credit Union would be Fast Ethernet at all sites. 52 Over the next five weeks, the Credit Union worked with the vendor to install and migrate to the CPE Adtran® Channel MUX devices. Once the T-1 connected sites where migrated over to the CPE Adtran® Channel MUX devices, the WAN vendor migrated the Credit Union’s metropolitan network virtual switching environment to the vendor’s new infrastructure. The Credit Union and the WAN vendor coordinated enabling jumbo frame support on the Credit Union’s primary and secondary WAN routers while the WAN vendor updated their routers. With Jumbo Ethernet Frames enabled, the fragmentation tests were performed again in the bridge test systems, but this time the MTU settings on these systems were set to their default values. The tests showed that with Jumbo Frame Support enabled on the WAN routers, there was no fragmentation and the performance requirements of the Bridge segments were met. This would allow the configuration for Ethernet frame size to be the default, unadjusted value and still meet the performance requirements, which would save issues with incorrectly configured systems in the future. This met the performance requirements of the failover segment while still avoiding supportable system configuration complexities. With this issue addressed, the Credit Union could move forward with implementing the failover systems on the bridge segment. 53 Chapter 9 FAILOVER TESTING With the failover bridge network ready for active use, the Systems Administration staff worked with the Jack Henry & Associates Business Continuity group to establish failover capabilities between the primary core host system in the headquarters data center and the secondary core hosts in the disaster recovery data center. First, the Credit Union worked with Jack Henry& Associates to install and configure Jack Henry’s Remote Recovery Software. This software’s remote poster processes creates journals for all transactions made on the primary host and ships these transactions to the secondary host. In a failover scenario the software initiates a series of scripted commands where the primary and secondary system exchange IP configurations details and Ethernet hardware addresses and then the secondary host takes the place of the primary host. Over the next year, the Systems Administration staff conducted a series of tests to develop confidence in the ability to exercise this procedure. In the first few tests, the Credit Union conducted tests where the failover procedure was invoked and then a few test transactions were conducted on the disaster recovery host system. Several of these attempts failed, but the root cause of these failures was not the network infrastructure; rather, they were configuration details that were missed during the initial configuration of the secondary host system and its installed software. Each time a test was attempted a new host software configuration detail was found to be at fault. Finally, over a period of one year, the failover systems were verified and the configurations details were corrected. The Credit Union was ready to move to its next 54 phase of failover testing, which involved operating out of the secondary data center location on the secondary host for a period of one week. After that time, the Credit Union would test the ability to fail back to the primary host. The first attempt at this test was aborted because an unexpected result in the recovery process was encountered during the test. It was later found that this unexpected result was due to a procedural issue in the nightly batch jobs that are run on the host systems and only impacted the reports used by the Collections department. The Credit Union considered scheduling another test of the Disaster Recovery software, but the Executive Management decided against another test. Instead the Credit Union plans to do a Recovery Restore test. In this test the Credit Union will restore a backup of the primary system to the secondary system and conduct operations using the secondary system for a period of one week. This method of recovery does impact the Credit Union’s Recovery Point Objective (RPO). This is because backups of the hosts system or done on a nightly basis as part of the night batch processes. This means that this recovery method is limited to restoring the system to its state prior its last nightly backup; therefore RPO for this method is twenty-four hours. Because of the transactional nature of the Remote Poster module, the RPO of the Jack Henry Disaster Recovery Software is only 15 minutes, which provides the organization with much better business continuity in the event of a failover recovery. The Credit Union may test the functionality of the remote poster application at a later date, and if those tests are successful, confidence in a better RPO may be gained. So 55 while it is likely that the Credit Union will eventually met its goals for testing the implementation for the Disaster Recovery software, these successful tests will come a much later date than originally anticipated. 56 Chapter 10 MANAGERIAL CONSIDERATIONS During the execution of the project, several critical success factors were observed. The first of these critical success factors were detailed planning: detailed schedules and contingency planning were key critical success factors. Expert advice was another key success factor for the project. Finally, organization support played a large part in the outcome of this project. Detailed Planning Detailed planning including contingency planning was the primary critical success factor for this project. During the planning of the network segmentation cutover, careful plans and checklists were prepared. Also, realistic timelines were built into the schedule for the cutover. The timelines included allowances for known issues, but it also allowed for the unexpected. This allowance reduced the risk of needing to roll back the changes. This was very important, since the changes being made to the data center resources was widespread across hundreds of systems. Checklists have been used in surgical and aviation settings to reduce the risk of errors. Checklists have also been used to increase the success rate when dealing with contingencies. By apply these same principles to IT implementations, errors can be reduced and better reactions to contingencies can be developed by reducing the number of preprogrammed choices for stressful situations. Also, allowances for the degraded capacities of tired staff were made. The cutover took place over a period of 17 hours starting at night. This meant that key 57 engineering staff would work while significantly tired. All of this was considered while building the schedule for the cutover and helped the networking tasks be successfully accomplished. On the other hand, the schedule for the implantation of the host system was not very detailed and did not do a good job for planning for contingencies. The schedule for the install and migration was not very detailed and only listed very generally what would be done during each day. There was not adequate time allowed for contingencies, so when the SAN had problems, the schedule did not allow for it. The implementers worked long hours to make up for the lost time and worked while they were tired. No allowances were made for working tired, and as a result a lot of minor mistakes were made. Because there were no detailed checklists prepared ahead of time, many of these mistakes were not detected until the Systems Administration staff attempted to install the Disaster Recovery Software and then in their later testing efforts. While these mistakes were detected and corrected over several tests of the failover capabilities, the mistakes did resulted in several test failures. Even though each of the test iterations did improve the failover process, each failure also increased the doubt and dissatisfaction that the Executive Management of the Credit Union has in the disaster recovery capabilities that the Disaster Recovery Software provided. Ultimately, this resulted in the compromise of the target RPO offered by the backup and restore method of disaster recovery testing. This may have been avoided if the implementers had used detailed schedules including detailed checklists. Also, better contingency planning could have avoided this 58 by being more realistic about the capacity of tired people to make mistakes. Either extending the installation time to use more days or scheduling the installation of the DR site to later date may have resulted in the implementers making fewer mistakes on the secondary host. All of this comes down to lack of detailed planning and inadequate allowances for contingencies. Expert Advice Another critical success factor was to seek the advice of others that have been successful in deploying similar solutions. The Credit Union sought the advice of an engineer that had previously deployed a similar solution at another financial institution. This allowed the Credit Union the advantage of knowing what kinds of challenges to expect and what kind of technologies can be leveraged to overcome these challenges. The advisor also provided a sounding board for the plans and designs that the Credit Union developed. In the Credit Union’s search for expert advice, many consultants presented themselves as experts on disaster recovery networking, but when questioned on whether they had successfully deployed and tested active/passive failover networks using geographically dispersed bridged network segments, the list of available experts shrunk. Most had deployed active/active clusters or active/passive clusters which relied on naming services to resolve dissimilar IP addressing on the active/passive cluster. The Credit Union found two sources of expert advice. One large $30-Billion Credit Union had deployed site-failover networks using dark fiber between sites. This solution involved capital expenses that were not within reach of the $1.2-Billion Credit 59 Union. The other experts had implemented a site-to-site bridging networking for a regional bank using pseudo-wire technology on a Metro Ethernet. This regional bank was closer in size and available financial resources as the Credit Union, so this advice was more applicable. So, finding the right expert advice that matches the business and technological needs of the organization is important. This helped lead the Credit Union to selecting technologies and to implementing solutions which were appropriate to the organization’s strategically goals and available resources. Organizational Support Another critical success factor was the organization support that the executive management of the Credit Union provided to the Network Engineering department. Many of the Network Engineering department’s activities required the efforts of all of the Information Systems groups as well as most of the business units within the entire organization. To be successful, these efforts needed to be completed within a very tight timeframe. This is even though each department and groups had its own priorities and agendas. Without the backing of upper management, it would have been difficult to get all of these efforts completed on time as required. The importance that management placed on these efforts motivated all of the departments to cooperate, and all of the activities were completed according to plan. Risk Management Techniques The Credit Union maintained rigorous scope management during the planning stages for the project, which reduced project risk by limiting the project scope to meeting 60 only those requirements that are necessarily for the project to initially succeed. Limiting the scope to only what is needed is a basic tenant of agile development methodologies since agile development is reiterative by nature. Not only do agile methodologies deliver a usable project faster, the methodologies can be used to reduce project risk. This is because the feedback loop is shorted for risk assessment. In agile projects, risk review happens before the project starts (Marcehnko, 2007). 61 Chapter 11 CONCLUSIONS AND RECOMMENDATIONS As the Credit moves forward in developing its disaster recovery capabilities, there are several milestones that should be addressed. First, the Credit Union should continue to test its disaster recovery capabilities by operating out of the disaster recovery facility for an extended period of time. This should be accomplished through the Credit Union’s plan to use a backup from the primary host on its secondary host to operate using the secondary host for a period of one week. This will provide confidence that Credit Union can recovery with a recovery point objective of up to 24 hours and a recovery time objective of six to twelve hours. Next, the Credit Union should continue testing and refining failover tests using the Recovery Failover software that is currently installed and likely working on the current systems. This will not only improve the Recovery Point Objective and Recovery Time Objectives, it will give the Credit Union confidence in executing the Disaster Recovery using Remote poster during a real disaster recovery event. Once this has been tested and the processes validate, the next step would be develop redundant third party VPN solutions at the secondary site. Developing redundant third party VPN solutions is a project that would be led by the Network Engineering team. Currently, the Credit Union maintains VPN connections at its headquarters site for peer connections with services such as the VISA Corporation VPN, the Credit Union’s Cooperative VPN, the Federal Reserve VPN, Jack Henry & Associates VPN, as well as the VPN connections that support the Banking Secrecy Act 62 services. These are all third party connections that currently require access to the core host system and some of them are mission critical to the operation of the Credit Union’s core business, and an outage during a disaster recovery event would impact the Credit Union member’s ability to access their financial resources. This is especially true for ATM, VISA, and the Federal Reserve VPN connections. In order to mitigate this exposure, the Credit Union should develop and test the ability to terminate these VPNs to the Secondary site. Once this is done the Credit Union’s most essential services will be protected from a disaster that impacts the headquarters site, but other services that be considered for protection included services such as the call center, online services, and other member communications systems such as Internet mail. Further analysis of which services should be protected would need to be analyzed for possible protection in the future. 63 APPENDICES 64 APPENDIX A Primary Site Core Switch Configuration Fragments hostname SW-HQ-CORE-01 ! aaa new-model ! switch 1 provision ws-c3750g-48ts switch 2 provision ws-c3750g-48ts switch 3 provision ws-c3750g-48ts switch 4 provision ws-c3750g-48ts switch 5 provision ws-c3750g-48ts switch 6 provision ws-c3750g-48ts ! system mtu routing 1500 vtp mode transparent authentication mac-move permit ip subnet-zero ip routing no ip domain-lookup ip domain-name sfcu.org ! ip vrf INT-PROD-DR1 rd 10.9.1.3:1 route-target export 143.9.1.3:1 route-target import 143.9.1.3:1 ! ip vrf INT-PROD-HQ1 rd 143.16.1.3:1 route-target export 143.16.1.3:1 route-target import 143.16.1.3:1 ! ip vrf INT-TRAIN-RR1 rd 143.16.1.3:3101 route-target export 143.16.1.3:3101 route-target import 143.16.1.3:3101 ! mls qos queue-set output 2 buffers 10 70 10 10 mls qos ! port-channel load-balance src-dst-ip ! spanning-tree mode rapid-pvst spanning-tree etherchannel guard misconfig spanning-tree extend system-id spanning-tree vlan 1-3,5-6,8,10,20,128,203-205,3013,3101,3113 priority 4096 spanning-tree vlan 3501,3503,3520 priority 4096 ! vlan internal allocation policy ascending 65 vlan dot1q tag native ! vlan 2 name HQ-PROD-SWITCH-143.16.2.0 ! vlan 3 name HQ-PROD-SRVR-143.16.3.0 ! vlan 5 name HQ-DEV-SRVR-143.16.5.0 ! vlan 6 name HQ-PROD-VISA-143.16.6.0 ! vlan 8 name HQ-PROD-VMHOST-143.16.8.0 ! vlan 10 name HQ-PROD-PRINT-143.16.10.0 ! vlan 20 name HQ-PROD-TERMS-143.16.20.0 ! vlan 128 name HQ-PROD-WKS-143.16.128.0 ! vlan 203 name BRDG-PROD-SRVR-143.16.203.0 ! vlan 204 name BRDG-PROD-HOST-143.16.4.0 ! vlan 205 name BRDG-DEV-SRVR-143.16.205.0 ! vlan 2203 name BRDG-OV-SRVR-13-10.9.203.0 ! vlan 3004 name RR-DEV-3004-143.16.4.0 ! vlan 3013 name HQ-FENCE-INT-13-143.16.0.0 ! vlan 3101 name HQ-TRAIN-3101-143.16.1.0 ! vlan 3103 name HQ-TRAIN-3103-143.16.3.0 ! vlan 3104 name HQ-TRAIN-3104-143.16.4.0 66 ! vlan 3113 name HQ-FENCE-INT-113-10.113.0.0 ! vlan 3116 name HQ-TRAIN-3116-143.16.16.0 ! vlan 3501 name HQ-STG-DMZ1-10.110.1.0 ! vlan 3503 name HQ-STG-DMZ1-10.110.3.0 ! vlan 3520 name RR-STG-DMZ1-20-10.110.20.0 ! vlan 3601 name HQ-TRAIN-DMZ1-10.110.1.0 ! vlan 3603 name HQ-TRAIN-DMZ3-10.110.3.0 ! ip ssh time-out 60 ip ssh version 2 ! ! interface Vlan1 description VL0001-HQ-PROD-TRANS-143.16.1.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.1.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 1 ip 143.16.1.1 standby 1 preempt delay minimum 60 reload 120 sync 120 standby 1 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan2 description VL0002-HQ-PROD-SWITCH-143.16.2.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.2.21 255.255.255.0 secondary ip address 143.16.2.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 67 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 2 ip 143.16.2.1 standby 2 preempt delay minimum 60 reload 120 sync 120 standby 2 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan3 description VL0003-HQ-PROD-SRVR-143.16.3.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.3.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 3 ip 143.16.3.1 standby 3 preempt delay minimum 60 reload 120 sync 120 standby 3 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan5 description VL0005-HQ-DEV-SRVR-143.16.5.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.5.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.11 ip directed-broadcast 176 ntp broadcast standby 5 ip 143.16.5.1 standby 5 preempt delay minimum 60 reload 120 sync 120 standby 5 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan6 description VL0006-HQ-PROD-VISA-143.16.6.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.6.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 68 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 6 ip 143.16.6.1 standby 6 preempt delay minimum 60 reload 120 sync 120 standby 6 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan8 description VL0008-HQ-PROD-VMHOST-143.16.8.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.8.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 8 ip 143.16.8.1 standby 8 preempt delay minimum 60 reload 120 sync 120 standby 8 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan10 description VL0010-RR-PROD-PRINT-143.16.10.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.10.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 10 ip 143.16.10.1 standby 10 preempt delay minimum 60 reload 120 sync 120 standby 10 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan20 description VL0020-RR-PROD-TERMS-143.16.20.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.20.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 69 ip directed-broadcast 176 ntp broadcast standby 20 ip 143.16.20.1 standby 20 preempt delay minimum 60 reload 120 sync 120 standby 20 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan128 description VL0128-HQ-PROD-WKS-143.16.128.0/20 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.128.3 255.255.240.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 128 ip 143.16.128.1 standby 128 preempt delay minimum 60 reload 120 sync 120 standby 128 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan203 description VL0203-BRDG-PROD-SRVR-143.16.203.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.203.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 103 ip 143.16.203.1 standby 103 preempt delay minimum 60 reload 120 sync 120 standby 103 authentication md5 key-chain hsrp-keys standby 203 ip 143.16.203.254 standby 203 priority 95 standby 203 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan204 description VL0204-BRDG-PROD-HOST-143.16.4.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.4.251 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 70 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 104 ip 143.16.4.254 standby 104 preempt delay minimum 60 reload 120 sync 120 standby 104 authentication md5 key-chain hsrp-keys standby 204 ip 143.16.4.253 standby 204 priority 95 standby 204 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan205 description VL0205-BRDG-DEV-SRVR-143.16.205.0/24 ip vrf forwarding INT-PROD-HQ1 ip address 143.16.205.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 105 ip 143.16.205.1 standby 105 preempt delay minimum 60 reload 120 sync 120 standby 105 authentication md5 key-chain hsrp-keys standby 205 ip 143.16.205.254 standby 205 priority 95 standby 205 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan2203 description VL2203-BRDG-PROD-OV-SRVR-10.9.203.0/24 ip vrf forwarding INT-PROD-DR1 ip address 10.9.203.3 255.255.255.0 ip helper-address global 143.16.3.225 ip helper-address global 143.16.3.226 ip helper-address 143.16.3.225 ip helper-address 143.16.3.226 ip helper-address global 143.16.3.110 ip helper-address 143.16.3.110 ip directed-broadcast 176 ntp broadcast standby 103 ip 10.9.203.1 standby 103 priority 95 standby 103 authentication md5 key-chain hsrp-keys standby 203 ip 10.9.203.254 standby 203 preempt delay minimum 60 reload 120 sync 120 standby 203 authentication md5 key-chain hsrp-keys ! interface Vlan3004 71 description VL3004-HQ-DEV-HOST-143.16.4.0/16 no ip address shutdown ! interface Vlan3101 description HQ-TRAIN-3101-143.16.1.0 ip vrf forwarding INT-TRAIN-RR1 ip address 143.16.1.3 255.255.255.0 ip access-group TRAINING_IN_acl in ip directed-broadcast 176 ntp broadcast standby 1 ip 143.16.1.1 standby 1 preempt delay minimum 60 reload 120 sync 120 standby 1 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan3103 description RR-TRAIN-3103-143.16.3.0 ip vrf forwarding INT-TRAIN-HQ1 ip address 143.16.3.3 255.255.255.0 ip access-group TRAINING_IN_acl in ip directed-broadcast 176 ntp broadcast standby 1 ip 143.16.3.1 standby 1 preempt delay minimum 60 reload 120 sync 120 standby 1 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan3104 description RR-TRAIN-3104-143.16.4.0 ip vrf forwarding INT-TRAIN-RR1 ip address 143.16.4.251 255.255.255.0 ip access-group TRAINING_IN_acl in ip directed-broadcast 176 ntp broadcast standby 104 ip 143.16.4.254 standby 104 preempt delay minimum 60 reload 120 sync 120 standby 104 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan3116 description RR-TRAIN-3116-143.16.16.0 ip vrf forwarding INT-TRAIN-RR1 ip address 143.16.16.3 255.255.255.0 ip access-group TRAINING_IN_acl in ip directed-broadcast 176 ntp broadcast standby 1 ip 143.16.16.1 standby 1 preempt delay minimum 60 reload 120 sync 120 standby 1 authentication md5 key-chain hsrp-keys arp timeout 300 ! 72 interface Vlan3601 description RR-TRAIN-DMZ1-10.110.1.0 ip vrf forwarding INT-TRAIN-RR1 ip address 10.110.1.3 255.255.255.0 ip access-group TRAINING_IN_acl in ntp broadcast standby 1 ip 10.110.1.1 standby 1 preempt delay minimum 60 reload 120 sync standby 1 authentication md5 key-chain hsrp-keys standby 2 ip 10.110.1.254 standby 2 preempt delay minimum 60 reload 120 sync standby 2 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan3603 description RR-TRAIN-DMZ3-10.110.3.0 ip vrf forwarding INT-TRAIN-RR1 ip address 10.110.3.3 255.255.255.0 ip access-group TRAINING_IN_acl in ntp broadcast standby 1 ip 10.110.3.1 standby 1 preempt delay minimum 60 reload 120 sync standby 1 authentication md5 key-chain hsrp-keys standby 2 ip 10.110.3.254 standby 2 preempt delay minimum 60 reload 120 sync standby 2 authentication md5 key-chain hsrp-keys arp timeout 300 ! interface Vlan3610 no ip address ! router eigrp 100 ! address-family ipv4 vrf INT-PROD-HQ1 redistribute connected redistribute static network 143.16.1.0 0.0.0.255 network 143.16.203.0 0.0.0.255 passive-interface Vlan2 passive-interface Vlan3 passive-interface Vlan5 passive-interface Vlan6 passive-interface Vlan8 passive-interface Vlan10 passive-interface Vlan20 passive-interface Vlan128 passive-interface Vlan204 passive-interface Vlan205 autonomous-system 100 exit-address-family ! address-family ipv4 vrf INT-PROD-DR1 120 120 120 120 73 redistribute connected network 10.9.1.0 0.0.0.255 network 10.9.203.0 0.0.0.255 autonomous-system 100 exit-address-family ! address-family ipv4 vrf INT-TRAIN-HQ1 redistribute connected network 143.16.1.0 0.0.0.255 autonomous-system 100 exit-address-family ! ip classless ip route vrf INT-PROD-HQ1 0.0.0.0 0.0.0.0 Null0 74 APPENDIX B Primary Site WAN Router Configuration Fragments hostname RTR-HQ-01 ! ! vrf definition Mgmt-intf ! address-family ipv4 exit-address-family ! address-family ipv6 exit-address-family ! aaa new-model ! ! aaa session-id common ip source-route ip vrf INT-PROD-OV1 rd 172.28.28.113:109 route-target export 172.28.28.113:109 route-target import 172.28.28.113:109 ! ip vrf INT-PROD-HQ1 rd 172.28.28.113:113 route-target export 172.28.28.113:113 route-target import 172.28.28.113:113 ! ip flow-cache timeout inactive 30 ip flow-cache timeout active 5 no ip domain lookup ip domain name schools.corp ! multilink bundle-name authenticated ! archive log config logging enable logging size 500 notify syslog contenttype plaintext hidekeys ! redundancy mode none ! crypto keyring INT-PROD-HQ1-key-tu0 local-address Loopback0 pre-shared-key address 143.0.0.16 key ******************** pre-shared-key address 143.0.0.15 key ******************** 75 crypto keyring vrf-INT-PROD-HQ1-key vrf INT-PROD-HQ1 pre-shared-key address 143.100.1.1 key ******************** pre-shared-key address 143.100.1.2 key ******************** pre-shared-key address 143.100.1.3 key ******************** pre-shared-key address 143.100.1.4 key ******************** pre-shared-key address 143.100.1.6 key ******************** pre-shared-key address 143.100.1.7 key ******************** pre-shared-key address 143.100.1.8 key ******************** pre-shared-key address 143.100.1.9 key ******************** pre-shared-key address 143.100.1.10 key ******************** pre-shared-key address 143.100.1.11 key ******************** pre-shared-key address 143.100.1.12 key ******************** pre-shared-key address 143.100.1.14 key ******************** crypto keyring INT-PROD-HQ1-key pre-shared-key address 143.100.1.1 key ******************** pre-shared-key address 143.100.1.2 key ******************** pre-shared-key address 143.100.1.3 key ******************** pre-shared-key address 143.100.1.4 key ******************** pre-shared-key address 143.100.1.6 key ******************** pre-shared-key address 143.100.1.7 key ******************** pre-shared-key address 143.100.1.8 key ******************** pre-shared-key address 143.100.1.9 key ******************** pre-shared-key address 143.100.1.10 key ******************** pre-shared-key address 143.100.1.11 key ******************** pre-shared-key address 143.100.1.12 key ******************** pre-shared-key address 143.100.1.14 key ******************** ! crypto isakmp policy 20 encr 3des authentication pre-share crypto isakmp keepalive 10 crypto isakmp profile wan-vpn-vrf vrf INT-PROD-HQ1 keyring vrf-INT-PROD-HQ1-key match identity address 143.100.1.0 255.255.255.240 INT-PROD-HQ1 local-address GigabitEthernet0/0/0 crypto isakmp profile wan-vpn keyring INT-PROD-HQ1-key match identity address 143.100.1.0 255.255.255.240 local-address GigabitEthernet0/0/0 crypto isakmp profile wan-vpn-tu0 keyring INT-PROD-HQ1-key-tu0 match identity address 143.0.0.9 255.255.255.255 local-address Loopback0 ! crypto ipsec transform-set wan-vpn esp-3des esp-sha-hmac crypto ipsec transform-set wan-vpn-transport esp-3des esp-sha-hmac mode transport ! crypto ipsec profile wan-vpn-transport-profile set transform-set wan-vpn-transport set isakmp-profile wan-vpn 76 ! crypto ipsec profile wan-vpn-transport-profile-tu0 set transform-set wan-vpn-transport set isakmp-profile wan-vpn-tu0 ! crypto ipsec profile wan-vpn-vrf-transport-profile set transform-set wan-vpn-transport set isakmp-profile wan-vpn-vrf ! ! ip ssh time-out 60 ip ssh authentication-retries 2 ! class-map match-any RealTime match ip dscp ef match protocol sip match protocol rtp match access-group name Queue_RealTime match protocol rtsp class-map match-any High match dscp af41 match dscp af42 match dscp af43 match protocol dns match protocol ntp match protocol snmp match protocol ldap match access-group name Queue_High match protocol telnet class-map match-any Medium match dscp af31 match dscp af32 match dscp af33 match access-group name Queue_Medium class-map match-any Low match dscp default match dscp af11 match dscp af12 match dscp af13 match access-group name Queue_Low class-map match-any C-01-100M match access-group name Queue_BR01 class-map match-any C-02-100M match access-group name Queue_BR02 class-map match-any C-03-100M match access-group name Queue_BR03 class-map match-any C-04-100M match access-group name Queue_BR04 class-map match-any C-06-T1s match access-group name Queue_BR06 class-map match-any C-07-T1s match access-group name Queue_BR07 77 class-map match-any C-08-100M match access-group name Queue_BR08 class-map match-any C-10-100M match access-group name Queue_BR10 class-map match-any C-11-100M match access-group name Queue_BR11 class-map match-any C-12-T1s match access-group name Queue_BR012 class-map match-any C-14-100M match access-group name Queue_BR14 class-map match-any C-HQ-1000M match access-group name Queue_HQ class-map match-any C-DR-1000M match access-group name Queue_DR ! policy-map C-PRIO class RealTime priority percent 10 set dscp ef class High bandwidth remaining percent 30 set dscp af41 class Medium bandwidth remaining percent 40 set dscp af31 class Low bandwidth remaining percent 29 set dscp af21 policy-map RTR-HQ-01-shape class C-DR-1000M shape average 800000000 service-policy C-PRIO class C-HQ-1000M shape average 800000000 service-policy C-PRIO class C-01-100M shape average 100000000 service-policy C-PRIO class C-02-100M shape average 100000000 service-policy C-PRIO class C-03-100M shape average 100000000 service-policy C-PRIO class C-04-100M shape average 100000000 service-policy C-PRIO class C-06-T1s shape average 3000000 service-policy C-PRIO class C-07-T1s shape average 3000000 78 service-policy C-PRIO class C-08-100M shape average 100000000 service-policy C-PRIO class C-11-100M shape average 100000000 service-policy C-PRIO class C-12-T1s shape average 3000000 service-policy C-PRIO class C-14-100M shape average 100000000 service-policy C-PRIO class class-default shape average 1000000000 ! pseudowire-class l2tpv3 encapsulation l2tpv3 sequencing both ip local interface Loopback113 ! interface Tunnel0 description ** To DR xconnect ** bandwidth 1000000 ip address 143.0.16.1 255.255.255.252 qos pre-classify tunnel source Loopback0 tunnel destination 143.0.0.9 tunnel key 1 tunnel protection ipsec profile wan-vpn-transport-profile-tu0 shared ! interface Tunnel1 description ** To Branch 01 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.1.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.1 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel2 description ** To Branch 02 ** bandwidth 100000 79 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.2.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.2 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel3 description ** To Branch 03 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.3.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.3 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel4 description ** To Branch 04 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.4.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.4 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel5 no ip address ip flow ingress ! 80 interface Tunnel6 description ** To Branch 06 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.6.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.6 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel7 description ** To Branch 07 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.7.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.7 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel8 description ** To Branch 08 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.8.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.8 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel9 81 no ip address ip flow ingress ! interface Tunnel10 description ** To Branch 10 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 10.120.10.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.10 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel11 description ** To Branch 11 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.11.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.11 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile shared ! interface Tunnel12 description ** To Branch 12 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.12.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.12 tunnel path-mtu-discovery 82 tunnel protection ipsec profile wan-vpn-transport-profile ! interface Tunnel13 no ip address ip flow ingress ! interface Tunnel14 description ** To Branch 14 ** bandwidth 100000 ip vrf forwarding INT-PROD-HQ1 ip address 10.120.14.1 255.255.255.252 ip directed-broadcast 176 ip mtu 1418 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 ip tcp adjust-mss 1378 delay 5 shutdown qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.14 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile ! interface Tunnel999 description ** To DR ** bandwidth 1000000 ip vrf forwarding INT-PROD-HQ1 ip address 143.120.13.1 255.255.255.252 ip flow ingress ip summary-address eigrp 100 143.16.0.0 255.255.0.0 load-interval 30 delay 5 qos pre-classify tunnel source GigabitEthernet0/0/0 tunnel destination 143.100.1.9 tunnel path-mtu-discovery tunnel protection ipsec profile wan-vpn-transport-profile crypto ipsec df-bit clear ! interface Loopback0 ip address 143.0.0.13 255.255.255.255 ! interface Loopback1 description RTR-RR-WAN-02-Lo1 | Management Loopback ip vrf forwarding INT-PROD-HQ1 ip address 172.28.28.1 255.255.255.252 ! interface Loopback113 description RTR-RR-WAN-02-Lo113 | psudo-wire loopback vrf ip address 172.28.28.113 255.255.255.255 ! shared shared shared INT-PROD-HQ1 83 interface GigabitEthernet0/0/0 description ** To SureWestWAN ** mtu 9216 ip address 143.100.1.13 255.255.255.0 no ip proxy-arp ip nbar protocol-discovery ip flow ingress load-interval 30 delay 10 negotiation auto service-policy output RTR-RR-01-shape ! interface GigabitEthernet0/0/1 description Link to SW-RR-CORE-01-Gi4/0/9 no ip address ip flow ingress negotiation auto ! interface GigabitEthernet0/0/1.1 encapsulation dot1Q 1 native ip vrf forwarding INT-PROD-HQ1 ip address 143.16.1.250 255.255.255.0 ip nbar protocol-discovery ip flow ingress ! interface GigabitEthernet0/0/2 description SW-RR-CORE-01-Gi1/0/48 L2TPv3 psudo wire to OV no ip address ip flow ingress negotiation auto xconnect 172.28.28.109 113 encapsulation l2tpv3 pw-class l2tpv3 ! interface GigabitEthernet0/0/3 description Link to SW-RR-SAN-01 Gi1/0/48 | 143.140.2.1 ip vrf forwarding INT-PROD-HQ1 ip address 143.140.1.1 255.255.0.0 ip nbar protocol-discovery ip flow ingress negotiation auto ! router eigrp 100 ! address-family ipv4 vrf INT-PROD-HQ1 redistribute connected redistribute static route-map redstatic network 143.16.0 0.0.255.255 network 143.120.0.0 0.0.255.255 network 143.140.0.0 0.0.255.255 autonomous-system 100 exit-address-family ! address-family ipv4 vrf INT-PROD-DR1 84 autonomous-system 100 exit-address-family network 143.0.16.0 0.0.0.3 redistribute connected ! ! ip flow-export source GigabitEthernet0/0/1.1 ip http server ip http secure-server ip route 143.0.0.9 255.255.255.255 GigabitEthernet0/0/0 143.100.1.9 ! ip access-list extended Queue_BR01 remark Branch 01 - 100Mbps permit ip any 172.24.1.0 0.0.0.255 permit ip any 143.1.0.0 0.0.255.255 ip access-list extended Queue_BR02 remark Branch 02 - 100Mbps permit ip any 172.24.2.0 0.0.0.255 permit ip any 143.2.0.0 0.0.255.255 ip access-list extended Queue_BR03 remark Branch 03 - 100Mbps permit ip any 172.24.3.0 0.0.0.255 permit ip any 143.3.0.0 0.0.255.255 ip access-list extended Queue_BR04 remark Branch 04 - 100Mbps permit ip any 172.24.4.0 0.0.0.255 permit ip any 143.4.0.0 0.0.255.255 ip access-list extended Queue_BR06 remark Branch 06 - 3.0Mbps permit ip any 172.24.6.0 0.0.0.255 permit ip any 143.6.0.0 0.0.255.255 ip access-list extended Queue_BR07 remark Branch 07 - 3.0Mbps permit ip any 172.24.7.0 0.0.0.255 permit ip any 143.7.0.0 0.0.255.255 ip access-list extended Queue_BR08 remark Branch 08 - 100Mbps permit ip any 172.24.8.0 0.0.0.255 ip access-list extended Queue_BR10 remark Branch 10 - 100Mbps permit ip any 172.24.10.0 0.0.0.255 permit ip any 143.10.0.0 0.0.255.255 ip access-list extended Queue_BR11 remark Branch 11 - 100Mbps permit ip any 172.24.11.0 0.0.0.255 permit ip any 143.11.0.0 0.0.255.255 ip access-list extended Queue_BR012 remark Branch 12 - 3.0Mbps permit ip any 172.24.12.0 0.0.0.255 permit ip any 143.12.0.0 0.0.255.255 ip access-list extended Queue_BR14 remark Branch 14 - 100Mbps 85 permit ip any 172.24.14.0 0.0.0.255 permit ip any 143.14.0.0 0.0.255.255 ip access-list extended Queue_DR remark Secondary DR Site - 1000Mbps permit ip any 172.24.9.0 0.0.0.255 permit ip any 143.15.0.0 0.0.255.255 permit ip any 10.159.0.0 0.0.255.255 ip access-list extended Queue_HQ remark Headquarter Site - 1000Mbps permit ip any 172.24.13.0 0.0.0.255 permit ip any 143.16.0.0 0.0.255.255 permit ip any 143.140.0.0 0.0.255.255 ip access-list extended Queue_High remark ** UDP Services ** permit udp any any eq domain permit udp any eq domain any permit udp any any eq ntp permit udp any eq ntp any permit udp any any eq snmp permit udp any eq snmp any permit udp any any eq snmptrap permit udp any eq snmptrap any permit udp any any eq syslog permit udp any eq syslog any remark ** TCP ** Services permit tcp any any eq ftp permit tcp any eq ftp any permit tcp any any eq 22 permit tcp any eq 22 any permit tcp any any eq domain permit tcp any eq domain any permit tcp any any eq 389 permit tcp any eq 389 any permit tcp any any eq 636 permit tcp any eq 636 any permit tcp any any eq 3389 permit tcp any eq 3389 any ip access-list extended Queue_Low permit ip any any ip access-list extended Queue_Medium remark ** Deny SAN traffic push down to low queue ** deny ip 143.140.0.0 0.0.255.255 10.159.0.0 0.0.255.255 deny ip 143.149.0.0 0.0.255.255 10.150.0.0 0.0.255.255 remark ** Deny NAS traffic push down to low queue ** deny ip any any permit ip any any ip access-list extended Queue_RealTime remark ***TDM_Over_IP_Device******************* permit ip host 143.16.1.5 any remark ***TCP and UDP 5060 5061**************** permit udp any any eq 5060 permit udp any eq 5060 any 86 permit udp any any eq 5061 permit udp any eq 5061 any permit tcp any any eq 5060 permit tcp any eq 5060 any permit tcp any any eq 5061 permit tcp any eq 5061 any route-map redstatic permit 10 match ip address 10 ! control-plane ! 87 BIBLIOGRAPHY Arduini, F. & Morabito, V. (2010, Mar) “Business continuity and the banking industry”, Communications of the ACM, 53(3), 121-125 Cabanatuan, M. (1998, Mar) “Risky Construction on Floodplains Experts starting to question concept of `100-year flood'” http://www.crcwater.org/issues4/calfloodplain.htmlSan Francisco Chronicle, Published March 23, 1998 Accessed October 6, 2011 “Core Solutions” (2011) http://www.symitar.com/CoreSolutionsAccessed October 7, 2011 Gray, C.F. and Larson, E.W. (2008) “What is the right project management structure?”In C.F. Gray & E.W. Larson, Project Management:The Managerial Process (4th ed.), (pp. 69-76). New York: McGraw-Hill/Irwin Heaton, W. (2000, Jul) “Reducing latency with VLANs http://www.techrepublic.com/article/reducing-latency-with-vlans/1033460, TechRepublic, Published July 6, 2000 Accessed October 9, 2011 Lindstrom, J., Samuelsson, S., and Hagerfors, A. (2010) “Business continuity planning methodology”, Disaster Prevention and Management, 19(2), 243-255 Liu, S., Zhnag, J., Keil, M, and Chen, T. (2010, Jul) “Comparing senior executive and project manager perceptions of IT project risk: a Chinese Delphi study” Information Systems Journal 20(4), 319-355 Lewis, M (2008, Mar) “Configuring an L2TPv3 Ethernet Pseudowire” http://www.networkworld.com/community/node/26272Network World published March 24, 2008 Accessed October 9, 2011 88 Lumpp, T., Schneider, J., and Mueller, M. (2008, Oct-Dec), “For high availability and disaster recovery to business continuity solutions”, IBM Systems Journal, 47(4), 605 619 Marchenko, A. (2007, Dec) Risk analysis in agile methods http://agilesoftwaredevelopment.com/blog/artem/risk-analysis-agile-methodsAgile Software Development Published December 21, 2007 Accessed October 9, 2011 Mattke, T. (2009, Nov) “cisco mpls vrf configuration and demo” Router Jockey Published November 19, 2009 Accessed October 9, 2011 Milunsky, J. (2009, May) “Significance of Time Boxing” http://agile.dzone.com/articles/qa-agile-approach-0 Agile Zone Published May 5, 2009 Accessed Oct 9, 2011 Stake, R. (1995). The art of case research. Newbury Park, CA: Sage Publications. Tammineedi, R.L. (2010, Jan), “Business Continuity Management: A Standards-Based Approach”, Information Security Journal: A Global Perspective, 19(1), 36-50 Totty, P. (2006, Apr) “Business Continuity Planning” Credit Union Magazine, 72(4), 8083 Yeo, K.T. and Ren, Y. (2008, Sep) “Risk Management Capability Maturity Model for Complex Product Systems (CoPS) Projects”, Systems Engineering, 12(4), 275 – 294 Yin, R. K. Case Study Research, Design and Methods, 3rd ed. Newbury Park, Sage Publications, 2002