- Sacramento

advertisement
DISASTER RECOVERY DATA NETWORKING FOR A FINANCIAL INSTITUTION
Michael A Soldwisch
B.S., California State University, Sacramento, 2005
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
BUSINESS ADMINISTRATION
(Management Information Systems)
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
FALL
2011
DISASTER RECOVERYDATA NETWORKING FOR A FINANCIAL INSTITUTION
A Project
by
Michael A Soldwisch
Approved by:
____________________________________, Committee Chair
Beom-Jin Choi, Ph.D.
________________________________
Date
ii
Student: Michael A Soldwisch
I certify that this student has met the requirements for format contained in the
University format manual, and that this project is suitable for shelving in the Library
and credit is to be awarded for the Project.
___________________________________________ _________________________
Monica Lam, Ph.D.
Date
Associate Dean for Graduate and External Programs
College of Business Administration
iii
Abstract
of
DISASTER RECOVERYDATA NETWORKING FOR A FINANCIAL INSTITUTION
by
Michael A Soldwisch
The Disaster Recovery Data Networking for a Financial Institution details the
advanced data networking challenges that a Credit Union faced during a project to
improve its disaster recovery capabilities. By examining this technically interesting
project, one can gain a better understanding of the practical application of advanced data
networking technologies and the reasoning behind their use in an organizational setting.
These technologies were used to build a geographically disperse site-to-site bridged
network segment to support the disaster recovery capabilities of an active/passive core
host cluster.
This project report shows that a Credit Union employed these technologies to
meets its strategic organizational goals of providing business continuity capabilities in the
event of a disaster that destroys or otherwise makes its centralized data center core
resources unavailable. The successful application of these advanced data networking
technologies was instrumental in meeting the organization’s strategic objectives in this
area.
Despite the complexity and technical challenges of the advanced data network
technologies employed to solve the Credit Union’s business continuity problems, the
iv
success was driven as much by solid project management practices, such as detailed
planning, organizational support and expert advice, as by technical expertise..
________________________________________, Committee Chair
Beom-Jin Choi, Ph.D.
__________________________________
Date
v
TABLE OF CONTENTS
List of Tables ................................................................................................................... viii
List of Figures .................................................................................................................... ix
Chapter
1. EXECUTIVE SUMMARY ............................................................................................ 1
2. ORGANIZATIONAL BACKGROUND ....................................................................... 3
3. PROJECT METHODOLOGY ....................................................................................... 9
4. PROJECT PLANNING ................................................................................................ 10
5. NETWORK SEGMENTATION PHASE .................................................................... 15
Network Segmentation Analysis........................................................................... 15
Network Segmentation Design ............................................................................. 25
Layer Two Design Issues .......................................................................... 25
Layer Three Design Issues ........................................................................ 29
Network Segmentation Implementation ............................................................... 32
Post Segmentation Cutover Business Unit Testing .................................. 37
Secondary Site Network Segmentation ................................................................ 38
6. NETWORK BRIDGE PHASE ..................................................................................... 39
Network Bridging Scope Analysis and Design .................................................... 39
Network Bridging Implementation ....................................................................... 42
Bridge Network Stress Testing ............................................................................. 44
vi
7. SYSTEMS INTEGRATION IMPLEMNATION ........................................................ 49
8. NETWORK BRIDGE TESTING REVISITED ........................................................... 51
9. FAILOVER TESTING ................................................................................................. 53
10. MANAGERIAL CONSIDERATIONS ...................................................................... 56
Detailed Planning .................................................................................................. 56
Expert Advice ....................................................................................................... 58
Organizational Support ......................................................................................... 59
Risk Management Techniques .............................................................................. 59
11. CONCLUSIONS AND RECOMMENDATIONS ..................................................... 61
Appendices........................................................................................................................ 63
Appendix A Primary Site Core Switch Configuration Fragments ................................... 64
Appendix B Primary Site WAN Router Configuration Fragments .................................. 74
Bibliography ..................................................................................................................... 87
vii
LIST OF TABLES
1. Headquarters Network Segments .................................................................................. 22
2. Network Segmentation Pre-Cutover Tasks ................................................................... 33
3. Network Segmentation Cutover Schedule .................................................................... 35
4. Non-Bridged Traffic MTU Calculations ...................................................................... 45
5. Bridged Segment MTU Calculations ............................................................................ 46
viii
LIST OF FIGURES
1. Organizational Chart (IT) ............................................................................................... 4
ix
1
Chapter 1
EXECUTIVE SUMMARY
The disaster recovery project involves several major phases to develop the ability
of the Credit Union to respond to temporary or permanent loss of the ability to operate
mission critical information technology systems that are normally supported by the
primary headquarters facility data center. The project will develop capabilities of
operating these critical systems using resources at a secondary disaster recovery facility
data center.
The first phases of the project are the activities associated with developing and
implementing a geographically dispersed multisite bridged network segment for the core
host systems. The ability to locate a system at either site will be developed during these
phases of the project. The Credit Union will use a bridging strategy instead of a naming
strategy to support disaster recovery for the core transaction system, which is an
active/passive solution. This solution depends on the Credit Union’s Network
Engineering department developing a multisite bridged network segment for use by the
primary and secondary host systems. In order to meet the Credit Union’s recovery
objectives, the solution also depends on the successful implementation and testing of the
Jack Henry & Associates’ Disaster Recovery software in the next phase of the project.
During the planning, analysis, design, and implementation activities of the
disaster recovery project, the relative successes of the networking phase versus the
systems integration and testing phase were highly dependent on the level of detailed
planning that was performed for the implementation for these phases. While there was a
2
great deal of detailed planning for the implementation activities of the network phase of
the project, there was very little formal planning for the implementation phase of the
systems integration and testing phase of the project. As a result, the network phase was
fairly successful and was completed on time while the systems integration and testing
phases have been significantly less successful and have been faced with extended delays
even though there was more time pressure to complete as well as elevated technical and
organization complexity in the networking phase.
3
Chapter 2
ORGANIZATIONAL BACKGROUND
The Credit Union serves more than 119,000 members with more than $1.2 billion
in assets. The Credit Union was chartered over 75 years ago in 1933. As a Credit Union,
the organization is non-profit and member owned. There are 12 branch locations in a
metropolitan area with a population of 2.15 million persons.
The Credit Union Information System management is headed by the Vice
President of Information Systems, who reports directly to the Chief Executive Officer.
Figure 1 – Organizational Chart (IT) shows the organizational structure of the
Information Systems departments. The Application Development manager, the Security
Manager, and the Project Management Office Manager each report directly to the VicePresident. The Director of Information Systems also reports to the Vice President. The
Service Desk Manager, the Network Manager, and the Systems Administration Manager
all report to the Director. The overall headcount for the Information Systems
Departments is 26 full-time employees. The Credit Union employs about 300 full-time
equivalent employees.
4
Figure 1
Organizational Chart (IT)
The Credit Union operations are highly dependent on the transactional financial
information that is maintained in the organization’s information system resources.
Information systems in the banking industry do not merely fulfill support functions;
rather they are considered production systems (Arduini & Morabito, 2010). The Credit
Union must be able to access the information systems resources in order provide services
to its members. The Credit Union has recognized the need for a disaster recovery plan
for some time, and some efforts have been made to provide capabilities to restore the
information systems’ operational capabilities in the event of a disaster. These efforts
have included maintaining offsite data backups, purchasing redundant equipment, and
5
even the existence of remote disaster recovery data center facilities. So, at the surface
level, there appears to be a commitment to preparing for disaster recovery; however,
these preparations did not included any real consideration for testing and verifying the
Credit Union’s ability to effectively use its disaster recovery resources. Tammineedi
(2010) emphasizes that testing and exercising the disaster recovery plan is a key element
of a standards-based approach to business continuity.
Furthermore, considerations such as realistic Recovery Time Objective (RTO)
and Recovery Point Objective (RPO) had not been analyzed. The RTO is the amount of
time between service disruption and restoration of services. While the business had made
efforts to insure that recovery would be possible, there was no agreed upon expectation as
to what the RTO would be. No one knew whether the RTO was minutes, hours, days, or
even weeks. This is indicative of the lack of clear recovery objectives and limitations
from senior management that are required for effective disaster recovery planning
(Lindstrom, Samuelsson, & Hagerfors 2010).
This stemmed from two factors. Senior management did not communicate to
Information Systems staff what its RTO expectations were. Also, the IS staff had not
developed detailed plans on how to engage the disaster recovery capabilities in the event
the disruption of services. Rather, disaster recovery planning was limited to insuring that
redundant equipment and systems were purchased for every new solution.
For example, when the Credit Union implemented an SSL VPN solution with
two-factor authentication for remote access, disaster recovery was only marginally
considered. This remote access solution was deployed into the headquarters location.
6
The two solutions were sufficiently integrated and the solution was accepted by the users.
The users use a crypto-key device as well as a password to get remote access to internal
systems resources from remote locations on the public Internet. When the equipment for
the solution was purchased, gear for both the headquarters location and the disaster
recovery was purchased, but only the gear at the headquarters location was integrated and
tested by the users. The disaster recovery gear was installed at the disaster recovery
location, but it was never integrated. So the Credit Union had enough commitment to the
idea of disaster recovery to purchase a second set of equipment for disaster recovery, but
not enough to thoroughly analyze exactly how to use this gear in the event of a disaster.
This is important because without knowing how the gear will be used in the event
of a disaster, it could take a considerable amount of engineering resources to put the gear
into production in the disaster recovery site. It is expected that engineering resources
would likely be very limited during a disaster situation. The purpose of this gear is to
provide remote accesses to the information system resources in the event that the remote
access solution deployed to the headquarter site is unavailable. This objective is much
more likely to be effective if the analysis on how the gear is to be used was done prior to
a disaster event. It is even more likely to be effective if the solution is deployed ahead of
time with at least a significant portion of the integration work done before a disaster
event.
This is just one example of one solution where the Credit Union purchased
additional systems for disaster recovery purposes without understanding how they
7
would be used. There are many more examples, but this one example is indicative of the
disaster recovery preparation made by the Credit Union.
So, while the Credit Union could satisfy the question that it engaged in disaster
recovery efforts, the disaster recovery preparedness was not very mature, and the ability
to effectively restore services in the event of a disaster was questionable. This is even
though the Credit Union devoted substantial financial resources to the purchase and
maintenance of redundant systems in support of its disaster readiness activities.
Recently, the Credit Union executive management has recognized a need to
improve its preparedness and put systems into production with disaster recovery as a
primary feature consideration. One catalyst for this commitment to preparedness was an
increased awareness of the potential for the loss of access to the information systems
resources provided by the headquarters data center facility, which is located in a 100-year
floodplain with an aging and ineffective levy system (Cabanatuan, 1998). This raised
concerns of a fairly probable potential flood event and its impact on the ability of the
Credit Union to provide services to its members. Credit Unions recognize that they have
contractual obligations to provide services whether or not they have the physical
capability to do so (Totty, 2006).
It is with this commitment to disaster recovery capabilities that the management
team considered an overhaul of the Credit Union’s aging core system platform. The
Credit Union’s core applications are hosted by the Episys® solution from Symitar™, a
Jack Henry & Associates Company. This solution is considered an industry leading
solution with 30 percent market share of Credit Unions with more than $1 billion of
8
assets (Core Solutions, 2011). This core host platform was a seven-year-old IBM AIX
host system that was scheduled for a hardware refresh in 2010.
The Credit Union executive management decided that it would seek to continue to
rely on the Episys® processing solutions, but that it would look for a solution that would
meet the requirements of providing continuity of the system in the event of a disaster.
The Episys® solution provides disaster recovery capabilities with rapid recovery and
continuous protection through the implementation of a secondary host. Management
went to the Board of Directors and obtained approval to proceed with a failover solution
project for the Episys® core processing system.
9
Chapter 3
PROJECT METHODOLOGY
This project uses the case study methodology to examine the use of advanced data
networking technologies to the strategic disaster recovery goals of a financial institution.
A case study is useful when an in-depth investigation of a problem is needed (Yin, 2002).
Since this project focus is on detailed application of technologies, the case study is
appropriate for analysis of the critical success factors for the application of these
technologies. It also explains in detail exactly how these technologies can be employed
and what the likely outcomes of these applications might be. Other methodologies can
hide these details that may be of particular interest (Stake, 1995).
While this case study allows a good look at the details of the application of these
data network technologies, this case study does have some limitation. A limitation of
concern is that this case study is limited to examining a single organization. This raises
some issues as to whether the findings of the case study can be universally understood by
organizations in general. Nevertheless, this case will provide the practitioner a better
insight into the expected results of the application of these technologies to similar
problems for other organizations.
10
Chapter 4
PROJECT PLANNING
A primary requirement of the failover solution was that the secondary host needed
to be physically located in the Credit Union’s existing disaster recovery data center
facility so that it would be reasonably isolated from most anticipated disaster scenarios.
This presented the first networking engineering discovery issue for the proposed solution.
There are two basic methods for providing high availability of the Episys host.
The first method relies on naming services. Using this method, client computers connect
to the Episys host using a host name and rely on naming services to resolve the host name
to an actual IP address. The secondary host is kept up-to-date and is free to have any IP
address following conventional IP addressing schemes. In the event that the primary host
is unavailable, the name record for the primary host can be modified to refer to the IP
address of the secondary host as part of the recovery process. This is generally handled
by employing private DNS servers with the TTL set to a very low value. This requires
client computers to update their naming cache from the DNS server on a frequent basis.
This helps accomplish the goals of high availability and help the organization met its
RTO. By doing this, the client computers can quickly access the secondary host in the
event of a disaster as though it were the primary host and operations can continue even
though the primary host is unavailable due to a disaster.
This method is supported by Jack Henry Associated, and it has the advantage of
that it could be employed without much change to the existing data communications
network infrastructure and supporting technologies. It is also a widely used method for
11
proving active/passive clusters and is mature; however, it does require than all systems
that access the Episys host system use host names and currently client computers,
including third party systems were configured to access the Episys host using its actual IP
address instead of its host name.
An alternate method of providing high availability of the core host does not rely
on naming services but it requires that both the primary and secondary hosts are directly
connected to the same network segment. This method allows clients to refer to the core
host using its actual IP address instead of using a host name. In the event that the
primary host is not available, the secondary host assumes the IP address of the primary
host and the goal of high availability is met.
This method is also supported by Jack Henry &Associates, and it allows the goals
of high availability to be met without modification of the clients and more importantly
without modification of third-party applications that access the core host. This was
especially important to upper management since modifying all of the third-party client
application to use host names instead of IP addresses would require more resources than
were available to this project. A high availability system where the cluster spans multiple
sites is a cornerstone of disaster recovery (Lumpp, Schneider, and Mueller, 2008).
Even though the Systems Administration group was primarily responsible for the
host system installations, the Network Engineering group was consulted early on in the
discovery phase of the host system installation project. During this phase of the project
various aspect of the network configuration for the host systems was discussed, and the
feasibility of the designed configuration of the networking for the various logical
12
partitions was determined. This was jointly discussed by the Jack Henry Presales
Engineering group and the Credit Union Network Engineering group.
It was the desire of the Credit Union to be able to support VLAN trunks on the
physical network interfaces used by the IBM PowerVM™ systems and map VLANs to
the virtual network interfaces created for the logical partition. This would allow the
flexibility to assign any VLAN to any logical partition on the systems and still maintain
the resilience that link aggregation provides. This would be useful to provide support of
Production, Staging, Development, and Training logical partitions using overlapping IP
addresses spaces, each separated in a single networking infrastructure using VRFs;
however, the Credit Union learned that even though this configuration is supported by
PowerVM™, it was not supported by the Jack Henry& Associates support group.
Jack Henry& Associates would support link aggregation but they would not
support VLAN tagging. Because of these, the Credit Union specified that the
PowerVM™ host system be configured with two multiport add-in network interface
adapters in addition to the four LAN-On-Motherboard (LOM) interfaces included on the
system board of the PowerVM™ host. The supported configuration was that each LOM
could be assigned to multiple logical partitions, but an entire add-in card had to be
assigned to one and only one logical partition.
Within this limitation Network Engineering was able to determine a network
configuration that supported all required logical partitions, but some active roles would
need to be distributed between the two hosts. The networking for each logical partition
was classified into six VLANs, including the production network, the management
13
network, the training network, the development network, the staging network, and the
release testing network. The production network and the management network must be
supported on both hosts. This left enough support for two more VLANs on each host, so
the remaining four non-production roles were split between the two hosts. This
configuration allowed an add-in network interface card to be dedicated to the active
production logical partition that would be used to support core system processing. The
other add-in interface was dedication to the VIOS logical partition. Some other
production logical partitions included a vaulting partition and a legacy production testing
partition.
Other project planning and discovery considerations for the host system included
issues such as systems sizing, storage requirements, and memory allocations. Once these
designs were agreed upon and funding for the project was approved by the Credit Union
Board of Directors and Supervisory Committee, agreements were signed and the
installation was scheduled. In the meantime network Engineering was tasked with
designing and implementing a bridged network to support the primary and secondary
hosts installed in the primary and secondary data centers.
During the initial planning phase for the project, it was determined that a network
segmentation phase would be the first phase of the project. This would be followed by a
multisite network bridge segment phase. These first two phases would be directed by the
Network Engineering department. The last two phases would be directed by the Systems
Administration Department. The last phase would be the systems integration
implementation phase followed by failover testing. All phases of the project management
14
activities were coordinated by the Director of Information Systems. This provided a level
of organization support that is needed for multi-departmental projects (Gray and Larson,
2008).
15
Chapter 5
NETWORK SEGMENTATION PHASE
Before the bridge environment could be designed and implemented, the Network
Engineering department undertook a Segmentation phase of the Host Systems Disaster
Recovery Project. While the planning for this phase was included in the over project
planning phase, this phase had its own analysis, design, implementation, and test phases.
This phase also had its own support requirements. This phase deserved special attention
because of the considerable organizational and operational risks associated with it.
This phase had increased organizational risk associated with it because it required
the coordination of multiple organizational departments, each with their own agendas and
operational priorities. This phase was also considered to be an operationally complex
project since it impacted every information systems solution used by the Credit Union.
Project success with organizationally and operational complexity is improved with the
risk management techniques associated with formal project planning (Yeo & Ren, 2008).
Network Segmentation Analysis
Network Engineering was tasked with finding ways to implement data networking
infrastructures that would allow the primary and secondary core host systems to be
located in geographically dispersed locations while still being connected to the same
network segment. The Credit Union already had a secondary data center established with
a high speed routed WAN connection to the main site. This data connection between the
main site and the secondary site is a one-gigabit Metro-Ethernet connection provided by a
regional telecommunications provider.
16
The first challenge that the Credit Union faced was its existing IP address scheme.
Each site was segmented into a single class-B IP network, including the headquarters site.
The headquarters site’s class-B segment supports over 500 nodes on a single broadcast
domain. This is the same segment that that supports the core host that needs to be
bridged to the recovery site. The concern was that if the entire segment were to be
bridged to the secondary site, and considerable portion of the WAN link would be
consumed by the broadcasts associated with having 500 nodes on a single broadcast
domain. This latency is associated with a high number of nodes in a single broadcast
domain can be reduced by separating Ethernet segments into separate VLANs (Heaton,
2000). The engineering staff sought the advice of the networking staff of another
financial institution that was experienced with bridging core network traffic.
This expert made a recommendation of that bridging such a large single broadcast
domain would generate far too much broadcast overhead on the WAN link connecting
the two sites. Prior experience was that not correctly choosing which network traffic to
bridge was a particular pain point for multisite bridged segment. It was recommended
that the Credit Union establish a new network segment just for the bridged traffic and that
the core host system be configured to reside in that new dedicated bridged segment. This
would keep the amount of broadcast traffic sent across the WAN link to a minimum and
have a much lower impact on the availability of WAN bandwidth between the two sites.
The Credit Union management considered this recommendation to be infeasible,
since moving the core host system to a new dedicated bridge segment would require a
change in the core host system’s primary IP address. The number of systems that had
17
hard coded references to this IP address made changing the IP infeasible for the same
reason that changing to named references to the host was considered infeasible. It was
also understood that excessive broadcasts going across the WAN link would not be
acceptable.
As an alternative compromise solution, it was proposed that the existing class-B
segment at the headquarters site be segmented into smaller segments to support a bridged
segment with the existing network IPs remaining configured on each node device
configuration. The workstations, servers, and network devices in the headquarters
location were configured on the network switch infrastructure using VLAN 1 (the default
VLAN). This alternative would segment this address space into smaller segments.
Among these segments would be one or more segments dedicated to systems that require
the capability of bridging to the disaster recovery site.
It was expected that this would solve a number of problems. First, it would reduce
the network broadcasts. By increasing the number of segments on the network and
decreasing the size of each segment, broadcasts would interrupt the network
communications of fewer systems. This would leave more network bandwidth available
to do useful work. And while this will increase the amount of traffic that now needs to
be routed, it was anticipated that the benefit of reduced broadcasts would outweigh this
drawback. Furthermore, it would offer the ability for the Credit Union to be selective as
to which traffic to bridge to the secondary site
I would allow a subset of the current network to be bridged to the disaster
recovery site. The disaster recovery plan for the host system relies on bridging, which
18
would allow network systems installed at the disaster recovery location to use IP
addresses used by systems at the headquarters site. This was required because the
disaster recovery plan for the host system required that the backup system would assume
the IP address and hardware MAC address of the primary host system in the event of a
disaster.
Since bridging sends all broadcast traffic across the bridged WAN link, the Credit
Union needed to be very careful with what traffic gets bridged, as this bridged traffic can
use a significant amount of the WAN bandwidth. Bridge traffic happens for an entire
network segment, so by creating smaller segments, the Credit Union could be more
selective in which traffic got bridged.
Lastly, creating additional network segments would lend to better security
controls on the network. While the Credit Union had assigned IP addresses to devices in
logical ranges that can be used for network access lists, this control would be much more
flexible when done on subnet boundaries instead of simple IP ranges.
The timeliest of these issues was the bridging issue. This is because of the
purchase of the new virtualized host system that would be installed within just a few
months. To avoid costly redesigns of the new virtualized host system, the bridging and
subnet project needed to be completed before the new virtualized host system was
installed. Management decided that the headquarters network segmentation activities
needed to be completed by two months prior to the scheduled install date of the new
virtualized host system activities. This would allow two weeks for disaster recovery
bridging activities to be the middle of the month prior to the new host installation. This
19
would allow the virtualization of the AIX environment to proceed without a costly
network redesign after the new core host system implementation.
It is important to note that the Credit Union planned to use a cutover strategy
instead of a migration strategy for this segmentation. In a migration strategy, new
parallel network segments would be established and then the Credit Union would move
the systems from the existing large segment to the newer smaller segments. Normally
this could be supported when applications reply on host names instead of the IP
addresses. And while having applications rely on names instead of IP addresses is
definitely a best practice, there is widespread reliance on IP addresses by many
applications. It was not considered feasible to remove these applications’ reliance in the
timeframe required by the business.
The migration strategy was the recommended strategy by the consulted
networking staff from the financial institution that had previously completed a similar
project. But because of the special circumstances for the Credit Union the additional
complexities and risks associated with a cutover strategy would have to be overcome
with a choreographed effort during the segmentation activities. Nevertheless, the
experience of the expert helped shape the segmentation analysis.
So, instead of a migration strategy, the Credit Union decided on a cutover
strategy. A cutover strategy would allow the systems to maintain their already
configured IP addresses. And while significant reconfiguration of the campus switching
equipment would be required; the updates to each individual node’s network interface
would be trivial. Only the subnet mask and gateway information would need to be
20
modified on each individual system to support the smaller subnets. However,
immediately after the headquarters campus switches were updated, all of the systems and
network devices would need to be immediately reconfigured to be available on the
network. During the time it takes to perform these activities, none of the data
communications for the entire campus would be functional. This made the cutover
strategy highly disruptive, thus the operational risks for this phase was particular high.
Simplifying the cutover as much as possible was considered to be the key to
making the cutover as non-disruptive and as successful as possible. This meant reducing
the scope of the changes to as few changes as possible while still meeting the
requirements for the bridging activities. These meant that certain changes will be tabled
for future iteration of subnet segmentation project phases. By reducing the number of
changes the impact of these changes could be reduced and the post cutover tests could be
more effective. Also, reducing the number of changes would lead to fewer unanticipated
problems that needed to be addressed after the cutover. This would also help manage the
disruption. Finally, the experience that would be gained during this first iteration will
make future cutovers even less disruptive. While making these changes in multiple
phases would ultimately require multiple touches to the same systems and would increase
the overall time required to complete the project, it would ultimately reduce the
disruption of the planned cutovers.
Since the new host system being installed was the most time critical driver for the
segmentation project, the first phase of the project would be aimed at satisfying the
requirements of the bridging project that depended on it. It was considered that the best
21
way to simply the first phase of the project was to reduce the number of segments to
support the bridging project. At minimum, to support the bridging project, the AIX host
system segment needed to be separated from the rest for the network segments. The
network segments detailed in Table1 would be created for the first phase of the LAN
segmentation project. These are not the actual IP addresses used by the Credit Union and
are for illustration purposes only. The VLAN identifiers are what are implemented in the
layers two switching devices to separate network segments into separate single broadcast
domains. Then the IP address subnets are aligned to these VLAN identifiers. The IP
subnets are defined by a Network IP address and subnet mask.
Because the time constraints for the segmentation and multisite bridge segment
phases were non-negotiable; classic waterfall methodologies did not appear to be
appropriate for the phases. Instead, the team used time boxing techniques associated with
the agile methodologies such as SCRUM. This meant doing only what was required. No
more and no less is included (Milunsky, 2009). This scope management minimizes
project timeline risk, but also helps manage the operational risk associated with this phase
of the project.
22
Table1
Headquarters Network Segments
VLAN
Description:
Network IP & Subnet Mask:
Usable IP Address Range:
ID:
VLAN 1
Routers and Firewalls
143.16.1.0 255.255.255.0
143.16.1.1 – 143.16.1.254
VLAN 2
Switches
143.16.2.0 255.255.255.0
143.16.1.1 – 143.16.2.254
VLAN 3
Servers
143.16.3.0 255.255.255.0
143.16.3.1 – 143.16.3.254
VLAN 204
Bridged AIX Systems
143.16.4.0 255.255.255.0
143.16.4.1 – 143.16.4.254
VLAN 5
Staging & Dev Servers
143.16.5.0 255.255.255.0
143.16.5.1 – 143.16.5.254
VLAN 6
**** Workstations
143.16.6.0 255.255.255.0
143.16.6.1 – 143.16.6.254
VLAN 8
Virtualization Servers
143.16.8.0 255.255.255.0
143.16.8.1 – 143.16.8.254
VLAN 10
Printers
143.16.10.0 255.255.255.0
143.16.10.1 – 143.16.10.254
VLAN 20
Terminal Servers
143.16.20.0 255.255.255.0
143.16.20.1 – 143.16.20.254
VLAN 128
Workstations
143.16.128.0 255.255.240.0
143.16.128.1 – 143.16.143.254
VLAN 203
Bridge Servers
143.16.203.0 255.255.255.0
143.16.203.1 – 143.16.203.254
VLAN 205
Bridged Staging & Dev Servers
143.16.205.0 255.255.255.0
143.16.205.1 – 143.16.205.254
Because of this, segmenting the workstations into separate network segments
based on department and function would be postponed for a future network segmentation
project. This is because all of the workstations are supported by closet switches with
more than 700 network ports. Determining the proper VLAN ID to assign to each of
these closet switches for the appropriate departmental VLAN would require more
resources than was available in the time period allowed for the network segmentation
activities. Also, it was considered that securing each workstation port to departmental
VLAN would be prone to error. Furthermore, it was a required in order to meet the
23
objectives for bridging the network traffic to support the failover capabilities of the new
core host system. Not including this level of segmentation helped reduce the complexity
of an already disruptive project.
It would have been simpler still to combine VLAN 10, VLAN 20 and VLAN 128
into one network segment, but because of the mathematics of IP subnet mask
calculations, this was not possible. Another alternative would be to assign new IP
addresses to the printers so that they were in the VLAN 128 range. This would make the
segmentation project easier to implement, since the printers are connected to the same
switches as the workstations and we would not have to support multiple VLANs on the
closet switches. This would go a long way to improve the chances of success for the
cutover, but the time to implement new IP addresses on the printers would have delayed
the project progression and put the target completion date at risk.
Another consideration was the option to use a secondary IP addresses on the
VLAN 128 on the switch. This would avoid the need to configure the individual ports on
the closet switches, since this configuration could support multiple subnet ranges on a
single VLAN. However, since the 10.13.6.0, 10.13.10.0 and the 10.13.128 IP ranges all
reside in the workstation space of the network, the switches would need to support two
secondary IP addresses for a VLAN and be able to support the layer three functions with
both secondary IP addresses. While the Credit Union network engineering staff was able
to confirm that this is supported with a single additional secondary IP address on a
VLAN interface, they were unable to confirm that it is supported with two secondary
interfaces. This configuration was considered somewhat experimental and while it would
24
potential decrease the complexity of the implementation, the inherent risks associated
with the unconfirmed support for the solution seemed to offset the simplification the
approach might offer.
In any case, most workstations would be initially configured to be all on the same
network segment. While this would not do much as it could to reduce the broadcast
traffic on the workstation segment, it did meet the requirement of reducing network
broadcast on the bridge segment. This approach did not solve all of the problems that
segmenting has the potential to solve, this approach did not preclude the availability of
applying more comprehensive segmentation approaches in future segmentation projects.
This approach did stabilize the network configuration for the AIX host systems and
provide the Credit Union networking and system administration teams with experience to
make future segmentations project less disruptive.
One thing that stands out in the VLAN scheme is that there is a discontinuity in
the relationship between the third octet of the IP address and the VLAN identifier for
VLAN 204. With every other VLAN identifier, there is a direct relationship between the
third octet of the IP address and the VLAN identifier; however this relationship is broken
for VLAN 204. This is because all of the bridged VLANs are being installed in the 2XX
range and the non-bridged VLANS fall below that range. The logical thing to do would
be to have the IP range of VLAN 204 of 143.16.204.0/16; however, the AIX system
requires that that it be installed on a bridged network segment and changing the IP
address of the AIX system from 143.16.4.1 to something in this range would not be
feasible in the time allowed for the project. So, the pattern would be disjointed for this
25
segment. This was not preferable, and while this might cause confusion for future staff, it
was considered out of scope to address this issue. This helped reduce the risks associated
with this phase of the project.
Network Segmentation Design
The activities involved in created the needed network segmentation to support the
multisite bridged network include both OSI layer two designs and layer three designs.
The layer two designs were considered before the layer three design issues.
Layer Two Design Issues
One of the first considerations for the headquarters network segmentation
activities was the creation of documentation of the physical network switch infrastructure
so that new switch configurations could be designed to support the segmentation. This
required that the IP address of each node attached to each particular network port be
documented so that the appropriate VLAN identifier could be assigned to each switch
port. This needed to be documented for well over 1000 switch ports deployed in the
headquarters network.
Since the switches operate at layer two, the switches only report hardware MAC
addresses and cannot directly report IP addresses. In order to get documentation of
hardware addresses with associated IP addresses, a correlation between the information
available from layer two devices and layer three devices could be performed to produce
the required documentation. This can be a tedious and error prone activity to perform
manually with the number of ports installed in the headquarters location, but there are
some automated tools to help produce the required documentation. The Credit Union
26
used the Switch Port Mapper tool from the Solarwinds Engineer’s Toolset to help
generate the required documentation. In some cases, because existing ACLs
implemented on the network gear will not allow the automatic collection of layer three
addresses to the core router, some devices still needed to be traced manually in order to
generate the needed physical documentation.
This documentation was then used to prepare a plan to reconfigure the network
switch so that VLAN tagging is done at the switch port level. In this design, the node
does not provide the VLAN tagging, rather, the switch added the VLAN tag to each
packet as its traverses through the switch’s fabric. The alternative would be to have each
node provide the VLAN tagging, which would be a decentralized approach to the issue.
The Credit Union preferred the centralized approach to VLAN tagging because it did not
require additional configuration for each node.
The Credit Union has a mix of Cisco and HP switches deployed in the
headquarters location. The Credit Union has deployed Cisco 3750G switches in support
of its data center and HP 4000M switches in support of the closet switches. The Cisco
switches do this using the switchport mode access and the switchport access
VLAN x
commands on a per port basis. The HP switches do this by using the untagged
command to tag untagged traffic to a particular VLAN ID. The key piece of
documentation that was needed to support this configuration was the IP address of each
system connected to each port on the switch. This physical switch documentation was
then used to design the switch configurations.
27
While preparing this documentation of the switch ports, it was found that a good
number of unmanaged desktop switches had been deployed by the desktop support staff.
This was done where additional nodes needed to be installed and there was not enough
structured cabling available to support additional nodes. Instead of installing additional
structured cabling, inexpensive unmanaged desktop switches where previously deployed
into these areas so that the existing structured cabling could support additional nodes.
This was an inexpensive and expedient way to support node growth, but presented a
problem for the segmentation activities.
It was assumed that there was a single node per switch port, but the existence of
these unmanaged desktop switches invalidated this assumption. This is problematic
because the VLAN assigned on a per port basis only works if all of the devices connected
on the ports will be on the same network segment defined by the VLAN identifier
assigned to each port on the managed switches. There were several available solutions to
this issue. One solution was to configure each of the nodes attached to the unmanned
switches to provide the VLAN tagging, but this alternative would require some nodes to
be configured differently from most other nodes. It was felt that creating an exception to
the new standard node configuration would cause a support issue in the future.
Another option would have been to remove the unmanaged switch devices and
replace them with managed switch devices so that the VLAN tagging could be set at the
leaf level. This had a couple of drawbacks in that it would be costly and that it would add
the additional administrative overhead of managing network devices that are not located
in a secured data closet.
28
A third option that the Credit Union considered was running additional network
cabling to the locations of the unmanaged desktop switches. It was determined that there
were enough available ports in the HP 4000M closet switches to support the nodes. This
option had a lower cost than the switch replacement option and maintained centralized
administration of the desktop switching infrastructure.
After Network Engineering discussed these issues with the desktop support staff,
it was agreed that unmanaged desktop switches would be removed for any switch that
was supporting nodes that needed to be configured in different network segments;
however, any unmanaged desktop switches that supported nodes where all of the nodes
would share the same network segment could remain. Network Engineering helped
identify the switches that would need to be removed and worked with the desktop support
staff to install additional structured cabling to support the removal of the incompatible
unmanaged desktop switches.
Once this was completed, Network Engineering went through several phases of
updating the switch map documentation and evaluating the documentation for
compatibility with switch port VLAN tagging. Once this documentation was accepted, it
was used to prepare configuration scripts to reconfigure the individual ports on the Cisco
switches. But since the HP switches do not support scripted configuration, these
configurations were developed in a lab environment and saved to files that could later be
transferred to the HP switches using TFTP.
29
Layer Three Design Issues
The next consideration was the design of the layer three functions. The Cisco
3750G switches at the network core was capable of performing the campus routing
functions. New gateway IP addresses would be assigned to the management interface for
each VLAN interface added to the core switch stack. For example, the management
interface for VLAN 204 on the core router will be assigned an IP address of 143.16.4.254
with a subnet mask of 255.255.255.0. This management interface would then become the
gateway address of all of the hosts on VLAN 204. IP routing would be enabled in the
configuration of the core switch.
A remaining switch configuration design issue is the design of the route
propagation. There were a few main choices available. Static routes could be added to
the core headquarters router. As long as static route re-propagation were turned on for
EIGRP routing on this layer three switch, this would take care of propagating the routes
to the rest of the network routing infrastructures such as the WAN routers. The other
option was to use the EIGRP stub routing support on the Base IP software on the Cisco
3750G core switch stack. Using this feature may be important for future projects when
multiple paths become available to the LAN segments, but as of the time of these design
activities there were no current requirements that would leverage this configuration
approach. In order to main simplicity for the cutover it makes the most sense to use the
first ion of re-propagated static routes instead of stub EIGRP route advertisement.
To simplify the issue, the Credit Union decided to use advanced IP services on the
3750G switch and eliminate the existing 143.16.1.1 core router as part of the cutover.
30
The 3750G switch stack would assume the 143.16.1.1 IP address and all of the ACLs and
static routes would be migrated to the layer three functions of the switch stack. This
stack would also be responsible for propagating EIGRP routes to the WAN routers.
The core layer-three switch device would also be responsible for routing to the
web café DMZ, which is the DMZ used to provide the branch office kiosk access to the
Internet. This access is used by members visiting branches that need access to internet
services offered by the Credit Union. This kiosk access to the Internet provides members
access to Internet Services offered by the Credit Union even if the member does not own
a computer or have access to Internet connectivity. This service helps the Credit Union
meet its strategic goals of increasing member usage of online services. Network
Engineering planned to move this function to the switch stack as well by implementing a
port with the no switchport option so that a VLAN does not need to be defined for
this DMZ in the core switching environment.
There are some servers which are configured with local static routes for
communications with systems which are generally not accessible to most systems. For
example, several hosts have static routes to send traffic to and from particular DMZs.
These static routes will not function when the access points are no longer on the same
physical segment. Because of this, these static routes will be documented, and new
routes will be implemented in the routers and firewalls. This would simplify
documentation and future examinations of the routing environment, making it more
predictable and supportable.
31
Once the configurations for the routers were completed, the Credit Union needed
to consider the configurations for the various firewall devices. Static routes for the new
LAN segments would need to be added. The firewall devices provide unrouted access to
all of the network devices at the headquarters LAN segment, but since the firewall
devices were not configured to use an internal default gateway, static routes would need
to be added to these devices in order for communication to be maintained. There may be
some network segments that will not have hosts need access to the firewalls, but for the
sake of simplicity and reliability, static routes would be added to each firewall’s
configuration. These design considerations helped manage the project risk associated
with this phase.
There was some overlap in the existing VLAN identifiers used for the fenced-off
staging and development networks. For example, the intuitive VLAN ID for the
143.16.10.0 network is VLAN 10. This is intuitive because the third octet of the IP
address is the most significant in identifying this segment. However, this VLAN was
already in use by the fenced off staging network for one the Credit Union’s staging and
development DMZ segments. The design choice was to either use less intuitive VLAN
identifier for the new internal production segments or to change the VLAN identifiers
used for these fenced off networks. It was felt that, in the long run, the benefit of using
intuitive VLAN IDs would likely outweigh the effort to assign new VAN IDs to the
fenced off network. This is because the effort to change the fenced off network was
fairly minimal, so this change was made during the design phase for the segmentation
activities.
32
Once all of these design issues where analyzed, designs for the implementation
scripts for the switches, routers, and infrastructure servers were developed. Developing
these changed scripts ahead of time reduced the chances of errors during implementation
and also would shorten an already prolonged change window.
Network Segmentation Implementation
The headquarters site network segmentation activities were divided into two
phases. The first phase included all of the pre-cutover activities. These tasks were
completed in advance of the actual cutover and where basically all tasks that could be
done outside of the actual cut-over date. Table 2 details the activities that were
performed during this phase of the implementation of network segmentation. These tasks
were completed on time to move forward with the next phase of the implementation,
which was the actual cut-over. Using these Time Boxing techniques borrowed from agile
development methodologies help reduce risk in the implementation cutover.
Memorial Day weekend was chosen as the cutover weekend for a number of
reasons. Completing the cutover on the long weekend allows two days for the entire
process. That gives Saturday night and Sunday to perform the cutover. It allows the
implementers to get some rest before business unit testing is performed on Monday when
the branches will be closed for the holiday. Allowing the implementers to rest before
business unit testing is performed was considered important because it was anticipated
that the engineers may need rested minds in order to respond to issues that the business
units may discover during their unit testing. If business units began their testing on
33
Sunday evening after the cutover was completed, the engineers wouldn’t have rested for
more than two days and would not have access to their full capabilities.
Table 2
Network Segmentation Pre-Cutover Tasks
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
Move staging virtual to internal cluster hosts
Retire fenced the staging fenced-off network segments switching infrastructure
Establish new VLANs for the staging fenced-off network segments
Document current persistent routes on all servers
Schedule cutover activities
Document current router and switch configurations
Modify routing scheme for Exchange Client Access/Hub Transport servers
Establish new virtual machine port groups on internal ESX servers
Document Static Routes of the episys and epibu hosts
Contact JHA to arrange for on-call Support the night of our network cutover
Update public web pages to advertise the system outage to members
Create checklists for workstation and printer reconfiguration and testing
Remove VLAN conflicts from IDF cabling
Verify IDF Configuration is free of VLAN conflicts
Establish new DHCP scopes, including workstation reservations
Verify IP addresses for all servers on active server documentation
Design and code routers, firewalls (PIX), and core switch configurations
Verify that only VLAN 6, VLAN 10, VLAN 20and VLAN 128 on IDF switches
Change three workstation IPs that conflict with new gateway IPs
Document original physical networking infrastructure
Consolidate HP 5300XL core switch into the Cisco 3750 switch stack
Remove all persistent routes from all internal servers
Make final router changes with all the last minute decommissions done this week
Resolve static routes on Epibu and Episys
Design and code new IDF switch configurations
Remove Static Routes from ISA servers
Document new physical networking infrastructure
Convert all printers IP configurations from static to DHCP reservations
Export passwords from password server
Review and document server shutdown and boot up order
Get power down and power up steps from JHA
Retire current HP 5300XL core switches
Server Decommissions
34
34. Create Network Engineering Critical Application Test Plan
35. Create checklists for server reconfiguration and testing
36. Stage new network device configurations on routers, firewall, and switches
Memorial Day weekend was the only long weekend before the core host system
would be installed in July. This weekend also leave the month of June to complete the
segmentation of the failover site, which was also an unsegmented class-b IP address
space, and to complete the bridge segment implementation between the two sites.
The Credit Union management considered it extremely important that that the
pre-cutover activities needed to be completed on time. This is because the segmentation
activities needed to be completed before the bridge network could be established for the
installation of the failover capabilities of the core hosts system in July. This is because
there were several Jack Henry & Associates resources that would not be able to be
rescheduled, so the July install date for the failover host could not be rescheduled. This
was complicated by the fact that there were several groups within the IT organization of
the Credit Union that needed to coordinate to make sure that all of these activities were
completed so that the cutover could be completed on the Memorial Day weekend.
There was a lot of potential for the project to fail from the risk associated with the
amount of interdepartmental coordination required for the activities to be completed on
time. Often, one department’s priorities are not the same as another department’s
priorities. However, upper management did a lot to emphasize the importance of this
project to the various department managers. This helped make sure that that the activities
being driven out of the Network Engineering group were treated as top priorities by other
35
departments. It is likely that the project’s success would have been put in serious
jeopardy if upper management was not so instrumental in communicating the strategic
importance of these activities and how important it was that the activities be completed
on time so that the cutover could be performed over the Memorial Day weekend. It is
widely recognized that senior executive support is key factor in reducing project risk
(Liu, Zhang, Keil, and Chen, 2010).
The Credit Union was successful in getting all of the pre cutover activities
completed before the long weekend cutover. The cut-over needed to be completed as
quickly as possible because the cutover was highly disruptive. Online member services
would not be available while the Credit Union reconfigured the network. Table 3 shows
the activities that were planned to be performed during the cutover. A detailed checklist
for each of these activities was also created.
Table 3
Network Segmentation Cutover Schedule
Network Segmentation Cutover Schedule
Start Time: 11 pm Saturday, May 29th, 2010
Start
Task
Time Mins Group
1
Preparation and Setup
2
Update the configuration of 10.13.20.0 devices
3
23:00
60
NE
Disable all alerts
Milestone – Start
NE
5
Reconfigure internal Checkpoint firewalls using SV-RR-SMC-01
0:00
0:00
6
Power off all DMZ virtual machines
0:15
7
Power down MCW servers
0:30
8
DMZ Cluster ESX service console and kernel reconfiguration
0:30
30
NE
9
Reboot DMZ cluster hosts
1:00
15
NE
4
15
NE
15
NE
SA
36
10
Power on and test public web servers using console commands
Milestone – public web server WWW back up [down 1 hr 15 min]
1:15
15
NE
12
Power off all Internal virtual machines
1:30
1:30
30
NE
13
Internal Cluster ESX service console and kernel reconfiguration
2:00
60
NE
14
Power off all Internal ESX hosts
3:00
15
NE
15
Reconfigure the IP Configuration of EPISYS and EPIBU
3:15
30
SA
16
Power off and disconnect the 10.13.1.1 core router
3:45
15
NE
17
Update the configurations on the IDF switches
4:00
30
NE
18
Update the configurations of Closet UPS devices
4:30
15
NE
19
Update IOS Image on 3750G Switch stack
4:45
30
NE
20
Activate new startup configuration on 3750G switch stack
5:15
15
NE
21
Activate changes on remaining routers and firewalls
5:30
60
NE
15
NE
15
NE
11
Milestone – Episys systems back up for [down 7 hr]
23
Re-cable DMZ5 port on the core router to the 3750G stack
6:30
6:30
24
Re-cable SAN 3750E SAN switch stack directly to the core stack
6:45
25
Test 3rd Party Vendor Apps (Coop, FSCC, Visa DPS, OFX)
7:00
26
Activate Changes on 3750E SAN switch stack
7:00
15
NE
27
Reconfigure VC, reboot & test management of DMZ ESX Hosts
7:15
30
NE
28
Reconfigure SV-RR-DS-01, reboot and test connectivity
7:45
15
NE
29
Power up remaining DMZ virtual machines
8:00
15
NE
30
e-services to begin Testing
8:15
31
Power on Internal ESX Hosts
8:15
15
NE
32
Reconfigure networking for Active Directory (DS) virtual servers
8:30
15
NE
33
Reconfigure and test physical ISA proxy servers
8:45
30
NE
34
Activate new DHCP Scopes (finish building some scopes)
9:15
15
NE
22
Milestone – Service Desk begins reconfiguring desktops & printers
SA
MKT
36
Reconfigure and test all printers
10:00
10:00
120
SD
37
Reconfigure and test desktops connectivity
10:00
240
SD
38
Reconfigure and test remaining physical servers
10:00
60
NE
39
Set new virtual machine port groups on internal virtual machines
11:00
90
NE
40
Power up remaining virtual machines
12:30
30
NE
41
Reconfigure all remaining virtual machines and power them off
13:00
120
NE
42
Power up all Domain Controllers / database servers
15:00
30
NE
35
43
Power up all remaining virtual servers
15:30
30
NE
60
NE
45
Test/Troubleshoot all Critical Server Functions and NetEng Tools
16:00
16:00
46
Test/Troubleshoot user desktop software & Service Desk Tools
16:00
60
SD
15
SA
44
Milestone - All Systems Up
47
Verify Symconnects ports
16:00
48
Milestone - Mission Accomplished
16:00
37
The cutover tasks were completed on time and within seventeen hours of
beginning the cutover activities, the cutover was completed successfully.
Post Segmentation Cutover Business Unit Testing
The individual business units were notified of the comprehensive changes that
were taking place over the Memorial Day weekend and they were asked to implement
unit tests of all critical functions starting at 12 PM on Monday afternoon. Network
Engineering and Systems Administration staffs were made available to address any issues
discovered by the business units. These two groups conducted their own critical
application tests starting at 9 AM the day after the cutover activities.
Each business unit was made responsible for developing their own business unit
testing plans. This way they could be sure that all systems that they needed to function as
business units were available and working properly after the network segmentation
cutover activities. This approach made each business unit responsible for know how to
test their own systems and provided accountability to the unit that required the systems to
perform their duties and deliver quality service the members of the Credit Union.
Amazingly, all unit tests performed by the business units were positive and the
information technology support staff did not need to fix anything after the cutover
activities. This was due in part to very detailed planning of the cutover and the
interdepartmental coordination made possible by the support and emphasis that upper
management gave to the disaster recovery project. Careful risk management paid off.
38
Secondary Site Network Segmentation
With the segmentation of the main headquarters site completed, it was time for
the Credit Union Network Engineering staff to turn its attention to the segmentation of
the secondary site. The failover site is located in a facility that shares a building with a
branch operations location. All branches local networks are segmented using a class-B IP
address space, and the failover facility was no exception to this.
The scale of the segmentation activities for the disaster recovery site was much
smaller and so was its potential impact. This is because there were not nearly as many
nodes involved and unavailability of most of the nodes would not have created much
impact on the core business systems used by the Credit Union. Because of this, the
segmentation activities were planned to be performed on a Saturday afternoon after the
branch was closed for the day.
The experience gained during the large headquarters segmentation activities also
made the preparation for the failover site cutover much easier. The documentation,
design, and scripts for this cutover went very quickly and the segmentation of the disaster
recovery site was completed without issue within two weeks of the main site
segmentation.
39
Chapter 6
NETWORK BRIDGE PHASE
With the Segmentation phase completed, the Network Engineering department
moved on the Network Bridge Segment Phase. The project planning for this phase was
completed during the overall project planning phase, so Network Engineering moved
straight away into the analysis and design of this phase of the project.
Network Bridging Scope Analysis and Design
One of the requirements of the network bridge is that sensitive data be encrypted
before they are transmitted across any networks that others may have access to. This
includes the telecommunications companies that provide the data communication
services for the Credit Union’s private WAN infrastructure. The Credit Union had
previously encrypted only certain high profile data that was transmitted across the private
WAN. The selection of which data was encrypted was based on Transport layer packet
header information and ACLs. If the transport was one of the transports that the Credit
Union considered sensitive, the traffic would be transmitted through an encrypted GRE
tunnel; otherwise the traffic was transmitted through an unencrypted GRE tunnel. By
selectively encrypting data across the private WAN, the Credit Union was able saved
money on the WAN routers.
Bridging network traffic across the private WAN raised some issues that made the
Credit Union reevaluate the policy of selectively encrypting data on its private WAN
links. Since bridge traffic is handled at the data frame level and would not be
differentiated in the same way that they can be in a routed environment, the Credit Union
40
required that all bridged traffic going across vendor networks must be encrypted;
however, the existing WAN routers that were installed in the headquarters and disaster
recovery sites were not rated to encrypt 1-Gb/s at wire speed, so the Credit Union needed
to replace these routers with router that were rated to encrypt the bridge traffic. The
Credit Union decided to purchase Cisco ASR-1002 routers to replace the installed Cisco
3845 routers because it is rated to encrypt 1-Gb/s at wire speed.
The network bridging project involves virtual wire, also known as pseudowire,
functionality between the headquarters datacenter and the Disaster Recovery datacenter.
This functionality allows for the same IP subnet to be utilized at both locations at the
same time. Using the L2TPv3 Ethernet Pseudowire, enterprises can extend layer-2
circuits over their IP networks (Lewis, 2008). The datacenters would be connected over
the 1-Gb/s primary WAN connection and 100-Mb/s secondary WAN connection. The
secondary link would provide the redundancy for the virtual wire between the two
datacenters. Having these two datacenters virtually connected will allow Credit Union to
meet the following business requirements.
First, the new host system failover method would move the active host’s primary
IP address, currently located at headquarters site, to the secondary host located at the
Disaster Recovery datacenter. This failover provides low administration overhead for
workstations and servers that are connecting to the host system via the active host’s
primary IP address.
Second, the virtually connected networks would also allow for virtual servers to
be “live migrated,” via VMware vMotion, over to the Secondary datacenter. This active
41
migration requires that the WAN connection be greater than 700Mb/s and have a latency
of less than 5ms round trip time between the two datacenters. The preliminary tests for
this connectivity are within the limits when utilizing the Primary 1-Gb/s metro Ethernet
link. The 100 Mb/s secondary links would not suffice for a “live migration” and would be
used to provide production data traffic without vMotion. It should be noted that this
second requirement will not be required for the immediate goal of providing the host
failover method; rather, the Credit Union included this requirement as a way to future
proof the solution.
The third requirement is the ability to effectively utilize the secondary datacenter
Internet resources with servers that are physically located at headquarters site. Using
Cisco’s implementation of Virtual Route Forwarders (VRFs) would allow the Credit
Union to have VLANs located at the headquarters site that will route differently to
external resources via the Internet connections at the secondary site. This technology
allows multiple routing tables to be maintained without the use of multiple routing
devices (Mattke, 2009). This functionality allows the IT staff to completely build a
system at headquarters datacenter before it is physically moved to the secondary
datacenter. This flexibility allows the staff to build secondary site systems while still
being physically available for daily support tasks at the headquarters datacenter. This
third requirement was considered optional.
Network Engineering planned for the layer two and layer three functions for the
disaster recovery site and headquarters site networks to be placed into VRFs. Each VRF
has its own routing and forwarding table. The VRFs were located on the two core
42
switches and on the four WAN routers. Having multiple VRFs and multiple routing
instances on each device provides the routing redundancy so if one datacenter was offline
the subnets that are bridged to the remote datacenter would still functionally route via the
other location.
Having multiple pseudo-wire pairs, both on the WAN routers, provides
redundancy for the bridged networks, so in the event that one WAN link pair was down,
the other WAN router pair would bridge that network.
Network Bridging Implementation
The implementation of the network bridging was a far less challenging than the
implementation of the network segmentation, even though the technologies used for the
network bridging are more advanced and complex. This is because there were fewer
devices to configure, fewer business units to coordinate, and the network downtime
required to implement the change was much shorter.
The plan was to implement one site at a time with the disaster recovery site being
implemented first. The first set of routers was installed without network bridging enabled
on the same evening that the disaster recovery segmentation activities were performed.
This install went without major incident and the main site cutover to the new Cisco
ASR1002 routers were scheduled for a following Wednesday evening after all of the
branches were closed.
This implementation would not impact online member services; rather, it would
only impact communications between the primary and secondary sites. This allowed the
implementation of the new WAN routers to be scheduled without a member-impacting
43
maintenance window. This provided a good amount of scheduling flexibility as long as
the deadline of having bridging functioning in advance of the July installation of the new
core host system with site failover capabilities.
With the networks properly segmented and the required equipment installed, it
was time to implement the network bridging between the two sites. This was scheduled
for a Saturday evening and was to be performed by two engineers both working together
at the primary site. There was some discussion as to whether one of the engineers should
work from the primary site while another engineer would work from the secondary site.
The thought was that if during the implementation of the network bridging, all network
connectivity was lost to secondary site, the engineer located at the secondary site would
be available to reestablish connectivity.
If both engineers were located at the primary site and site-to-site connectivity
were lost, this would require a 40-minute drive to the secondary site to manually
reestablish network connectivity. In the end, the Credit Union decided that the improved
communications between the engineers located at the same site outweighed the
possibility of extending the maintenance window by having to drive to the secondary site.
This worked well because the engineers never completely lost connectivity to the
secondary site. This was achieved by manually switching between the primary and
secondary WAN connections; however, there were issues that prevented the engineers
from establishing a secure bridged segment.
During the implementation of the bridge network it was discovered that there was
a problem with the Cisco IOS code on the ASR 1002 routers that prevented the use of
44
tunnel mode protection on the GRE tunnels established in non-global VRFs. This
combination was tested in a lab using Cisco 3845 routers, but this particular problem did
not appear on the Cisco 3845 router equipment. The engineers could only get the GRE
tunnels to stay stable on the ASR1002 routers by using unencrypted GRE tunnels.
Because of this, the Credit Union decided that the implementation of the bridge network
changes would be reverted until a solution to the encryption problem could be found.
Within a couple of days, the engineers were able to identify a workaround to the
tunnel protection mode on the GRE tunnels in non-global VRFs by removing keepalive
statements from these GRE tunnels. Another implementation of the bridged network was
scheduled and the bridge network was implemented. During this round it was found that
the GRE tunnels were stable with encryption. It was also found that the keepalive
statements were not needed.
Bridge Network Stress Testing
With the bridge network in place, the next step was to stress test the network
bridge configuration. This involved collecting network analytics while placing very close
to maximum expected loads on the bridge links. This test was performed using a pair of
Windows 2008 R2 Servers. One of the servers was installed and configured on primary
site’s bridge segment and the other server was installed and configured on secondary
site’s bridge segment. Each server was configured with an array of six disks in RAID 0
configuration.
45
Each server was configured with an MTU of 1292 bytes, which is smaller than
normally Ethernet frames because of the overhead of the protocols used to provide the
encrypted bridge link. Table 4 shows the initial calculations used to determine a starting
point of MTU for the routed traffic.
Table 4
Non-Bridged Traffic MTU Calculations
MTU 1500
IP Header2 -20 = 1480
IPSEC -58 = 1422
GRE -4 = 1418
Cisco recommended buffer -18 = 1400(This is what Tunnels should be configured with IP MTU)
MSS Diff -40 = 1360 (This is the MSS value)
Table 5 shows the initial configuration for a starting point for the MTU for
bridged traffic. This is of particular importance because hosts will negotiate an
appropriate MTU when TCP traffic is transmitted in a routed environment. This is not
negotiated non-routed traffic and in a bridged network environment and is more likely to
result in severe fragmentation.
46
Table 5
Bridged Segment MTU Calculations
MTU 1360
IP Header2 -20 = 1340
GRE -4 = 1336 (This is the IP MTU of Tu0)
802.1q VLAN tag -4 = 1332
MSS Diff -40 = 1292 (This is the MSS value of Tu0) MTU size of devices on bridged networks
Tests were designed to measure the total throughput across the bridge link to
determine the amount of bandwidth and latency that the Bridge network solution could
deliver. The primary objective of the test was make sure that the bridge network could
deliver the 700 Mb/s of bandwidth at less than 15 msec of latency while transmitting
performing bulk transfers. Another objective was to avoid fragmentation of the frames.
The Credit Union found that unless the MTU of the test hosts was set to no higher
than 1280 bytes, severe fragmentation occurred in the test, and performance was much
less than it could be. After setting the MTU of the test hosts to 1280 bytes, it was found
that the bridge network did meet the performance requirements of bandwidth and latency.
It was determined that the Credit Union would need to configure this custom
MTU on all hosts that would be installed on the bridge segment. This is not a typical
configuration, and the method of configuring this varied widely between various
appliance and host nodes. The Credit Union felt that this may led to frequent
configuration errors and subsequent performance issues, so another alternative was
sought. The alternative that seemed to make the most sense was to enable jumbo frame
47
sizes on the WAN interfaces. This would allow the WAN routers to accept larger frames
and even with the overhead of the various protocols such as encrypted GRE tunnel and
pseudo-wires, fragmentation of the frames would not be required since the WAN
equipment could support frame size larger than 1500 bytes. This would allow the hosts
on the bridge segment to avoid special configuration of the MTU and still maintained the
required network performance. The drawback is this would require coordination with the
data telecommunications vendors used by the Credit Union.
After contacting the primary WAN vendor, it was determined that the vendor
equipment installed at that time could not support jumbo frame sizes, but that the vendor
was in the process of migrating to equipment that they felt would support jumbo frames.
The WAN equipment used by the Credit Union (Cisco ASR 1002) did support jumbo
frames. The WAN vendor tested their ability to support jumbo frames on their new
infrastructure and they confirmed that the new equipment could support jumbo Ethernet
frames as large 9218 bytes. The vendor agreed to make the required changes to their
infrastructure, but the changes could not be done before the redundant host system
installation.
The Credit Union decided that it would proceed with the install of the redundant
core host system with the fragmentation issue in place. It was acceptable to address this
fragmentation issue after the installation of the primary and failover host because these
future changes to the site-to-site bridge segment would not require the host system
configuration to be re-engineered; the Credit Union felt that the network bridging
48
infrastructure was complete enough to proceed with the installation of the new hosts on
time.
49
Chapter 7
SYSTEMS INTEGRATION IMPLEMENTATION
The Credit Union’s Network Engineering staff worked with the implementation
staff from Jack Henry& Associates to install and configure the IBM PowerVM™ AIX
virtualization hosts and their various logical partitions. The plan was for the
implementers to prepare the host systems during the last weekend in July and then to
migrate production to the new host systems over the weekend in July. There were a total
of four host systems to configure. There was an internal host and a DMZ host installed at
each location, and each host supported six to eight logical partitions, including a
hypervisor on each host system. There were also management stations to install at each
site and fiber channel SAN infrastructures to install at each site. The plan was to install
and configure the equipment installed at the primary site before moving on to the
secondary site.
Once the equipment was prepared, the migrations strategy was to perform a
backup of the current system during the night maintenance batch jobs and restore to the
new system onto the primary logical partition on the primary IBM PowerVM™ host
system. During the migration batch processes, the staged system would swap IP
configuration details with the in place system. After the migration process, the Credit
Union would begin use of the system supported on a logical partition of the new
PowerVM™ host. All of this would be done in the evening after the close of business
Saturday night. This allowed Sunday for the business units to test their processes before
the start of business on Monday morning.
50
The Jack Henry Implementation team ran into issues with the installation and
configuration of the SAN infrastructure. After struggling with several reloads of the
logical partitions, it was finally determined that the new SAN controller was defective.
IBM replaced these controllers, but then the implementers found themselves significantly
behind schedule. Rather than reschedule the migration for a later date, the implementers
decided to work longer hours to catch up so that they could still complete the migration
the last weekend in July.
The implementers worked several days of very long hours, and they were able to
get the systems ready to complete the migration of the core host systems. The Credit
Union observed, and the work appeared to be fairly unstructured with little use of
checklists or written procedures. This was more apparent during the preparation of the
secondary site systems. This was considered acceptable because the logical partitions
configured on the hosts installed at the secondary site would not be put into production
during the last weekend of July; rather, they would just being prepared for later use.
Through the efforts and long hours of Jack Henry Implementation staff, the
systems were prepared in time to begin the migration Saturday evening. The migration
was completed by late Sunday morning and initial testing of the Episys host proceeded on
time. Online member services were reinstalled after testing by the marketing department.
Then all of the various business units proceeded with their tests. It was determined that
the migration was successful and the Credit Union proceeded to operating on the new
logical partition.
51
Chapter 8
NETWORK BRIDGE TESTING REVISITED
The primary Episys host was in place and working, but for the first phase in the
disaster recovery system to be complete, the goal of failover to the secondary site must be
met. As a prerequisite for this the accomplished the bridge network needed to meet the
performance requirements, so the next step to develop the host failover capabilities was
to enable Jumbo Ethernet frames on the WAN Segments between the primary and
secondary site. It was felt that the Jumbo Ethernet frame support on the WAN equipment
would allow the bridge network to provide adequate performance by preventing severe
fragmentation of layer-two traffic as it is transmitted over the pseudo-wire and encrypted
GRE tunnels.
The Credit Union began working with the WAN vendor to allow support for
Jumbo Ethernet frames at for both sites. The WAN vendor would need to move the
Credit Union’s virtual private WAN switching environment from one infrastructure to
another. The WAN vendor did not require any physical changes to the Gigabit and Fast
Ethernet connected locations, but it did require changes for the equipment installed in the
T-1 connected locations.
The T-1 connected sites needed to migrate to channel MUX devices. Previously,
the T-1 interfaces were installed in the Credit Union’s routers. In order to support the
new infrastructure, the T-1s would be installed in channel MUX devices managed by the
WAN vendor, and the handoff to the Credit Union would be Fast Ethernet at all sites.
52
Over the next five weeks, the Credit Union worked with the vendor to install and migrate
to the CPE Adtran® Channel MUX devices.
Once the T-1 connected sites where migrated over to the CPE Adtran® Channel
MUX devices, the WAN vendor migrated the Credit Union’s metropolitan network
virtual switching environment to the vendor’s new infrastructure. The Credit Union and
the WAN vendor coordinated enabling jumbo frame support on the Credit Union’s
primary and secondary WAN routers while the WAN vendor updated their routers.
With Jumbo Ethernet Frames enabled, the fragmentation tests were performed
again in the bridge test systems, but this time the MTU settings on these systems were set
to their default values. The tests showed that with Jumbo Frame Support enabled on the
WAN routers, there was no fragmentation and the performance requirements of the
Bridge segments were met. This would allow the configuration for Ethernet frame size to
be the default, unadjusted value and still meet the performance requirements, which
would save issues with incorrectly configured systems in the future. This met the
performance requirements of the failover segment while still avoiding supportable system
configuration complexities. With this issue addressed, the Credit Union could move
forward with implementing the failover systems on the bridge segment.
53
Chapter 9
FAILOVER TESTING
With the failover bridge network ready for active use, the Systems Administration
staff worked with the Jack Henry & Associates Business Continuity group to establish
failover capabilities between the primary core host system in the headquarters data center
and the secondary core hosts in the disaster recovery data center. First, the Credit Union
worked with Jack Henry& Associates to install and configure Jack Henry’s Remote
Recovery Software. This software’s remote poster processes creates journals for all
transactions made on the primary host and ships these transactions to the secondary host.
In a failover scenario the software initiates a series of scripted commands where
the primary and secondary system exchange IP configurations details and Ethernet
hardware addresses and then the secondary host takes the place of the primary host. Over
the next year, the Systems Administration staff conducted a series of tests to develop
confidence in the ability to exercise this procedure. In the first few tests, the Credit
Union conducted tests where the failover procedure was invoked and then a few test
transactions were conducted on the disaster recovery host system.
Several of these attempts failed, but the root cause of these failures was not the
network infrastructure; rather, they were configuration details that were missed during the
initial configuration of the secondary host system and its installed software. Each time a
test was attempted a new host software configuration detail was found to be at fault.
Finally, over a period of one year, the failover systems were verified and the
configurations details were corrected. The Credit Union was ready to move to its next
54
phase of failover testing, which involved operating out of the secondary data center
location on the secondary host for a period of one week. After that time, the Credit
Union would test the ability to fail back to the primary host. The first attempt at this test
was aborted because an unexpected result in the recovery process was encountered during
the test. It was later found that this unexpected result was due to a procedural issue in the
nightly batch jobs that are run on the host systems and only impacted the reports used by
the Collections department.
The Credit Union considered scheduling another test of the Disaster Recovery
software, but the Executive Management decided against another test. Instead the Credit
Union plans to do a Recovery Restore test. In this test the Credit Union will restore a
backup of the primary system to the secondary system and conduct operations using the
secondary system for a period of one week.
This method of recovery does impact the Credit Union’s Recovery Point
Objective (RPO). This is because backups of the hosts system or done on a nightly basis
as part of the night batch processes. This means that this recovery method is limited to
restoring the system to its state prior its last nightly backup; therefore RPO for this
method is twenty-four hours. Because of the transactional nature of the Remote Poster
module, the RPO of the Jack Henry Disaster Recovery Software is only 15 minutes,
which provides the organization with much better business continuity in the event of a
failover recovery.
The Credit Union may test the functionality of the remote poster application at a
later date, and if those tests are successful, confidence in a better RPO may be gained. So
55
while it is likely that the Credit Union will eventually met its goals for testing the
implementation for the Disaster Recovery software, these successful tests will come a
much later date than originally anticipated.
56
Chapter 10
MANAGERIAL CONSIDERATIONS
During the execution of the project, several critical success factors were observed.
The first of these critical success factors were detailed planning: detailed schedules and
contingency planning were key critical success factors. Expert advice was another key
success factor for the project. Finally, organization support played a large part in the
outcome of this project.
Detailed Planning
Detailed planning including contingency planning was the primary critical
success factor for this project. During the planning of the network segmentation cutover,
careful plans and checklists were prepared. Also, realistic timelines were built into the
schedule for the cutover. The timelines included allowances for known issues, but it also
allowed for the unexpected. This allowance reduced the risk of needing to roll back the
changes. This was very important, since the changes being made to the data center
resources was widespread across hundreds of systems.
Checklists have been used in surgical and aviation settings to reduce the risk of
errors. Checklists have also been used to increase the success rate when dealing with
contingencies. By apply these same principles to IT implementations, errors can be
reduced and better reactions to contingencies can be developed by reducing the number
of preprogrammed choices for stressful situations.
Also, allowances for the degraded capacities of tired staff were made. The
cutover took place over a period of 17 hours starting at night. This meant that key
57
engineering staff would work while significantly tired. All of this was considered while
building the schedule for the cutover and helped the networking tasks be successfully
accomplished.
On the other hand, the schedule for the implantation of the host system was not
very detailed and did not do a good job for planning for contingencies. The schedule for
the install and migration was not very detailed and only listed very generally what would
be done during each day. There was not adequate time allowed for contingencies, so
when the SAN had problems, the schedule did not allow for it. The implementers
worked long hours to make up for the lost time and worked while they were tired. No
allowances were made for working tired, and as a result a lot of minor mistakes were
made. Because there were no detailed checklists prepared ahead of time, many of these
mistakes were not detected until the Systems Administration staff attempted to install the
Disaster Recovery Software and then in their later testing efforts.
While these mistakes were detected and corrected over several tests of the failover
capabilities, the mistakes did resulted in several test failures. Even though each of the
test iterations did improve the failover process, each failure also increased the doubt and
dissatisfaction that the Executive Management of the Credit Union has in the disaster
recovery capabilities that the Disaster Recovery Software provided. Ultimately, this
resulted in the compromise of the target RPO offered by the backup and restore method
of disaster recovery testing.
This may have been avoided if the implementers had used detailed schedules
including detailed checklists. Also, better contingency planning could have avoided this
58
by being more realistic about the capacity of tired people to make mistakes. Either
extending the installation time to use more days or scheduling the installation of the DR
site to later date may have resulted in the implementers making fewer mistakes on the
secondary host. All of this comes down to lack of detailed planning and inadequate
allowances for contingencies.
Expert Advice
Another critical success factor was to seek the advice of others that have been
successful in deploying similar solutions. The Credit Union sought the advice of an
engineer that had previously deployed a similar solution at another financial institution.
This allowed the Credit Union the advantage of knowing what kinds of challenges to
expect and what kind of technologies can be leveraged to overcome these challenges.
The advisor also provided a sounding board for the plans and designs that the Credit
Union developed.
In the Credit Union’s search for expert advice, many consultants presented
themselves as experts on disaster recovery networking, but when questioned on whether
they had successfully deployed and tested active/passive failover networks using
geographically dispersed bridged network segments, the list of available experts shrunk.
Most had deployed active/active clusters or active/passive clusters which relied on
naming services to resolve dissimilar IP addressing on the active/passive cluster.
The Credit Union found two sources of expert advice. One large $30-Billion
Credit Union had deployed site-failover networks using dark fiber between sites. This
solution involved capital expenses that were not within reach of the $1.2-Billion Credit
59
Union. The other experts had implemented a site-to-site bridging networking for a
regional bank using pseudo-wire technology on a Metro Ethernet. This regional bank
was closer in size and available financial resources as the Credit Union, so this advice
was more applicable. So, finding the right expert advice that matches the business and
technological needs of the organization is important. This helped lead the Credit Union
to selecting technologies and to implementing solutions which were appropriate to the
organization’s strategically goals and available resources.
Organizational Support
Another critical success factor was the organization support that the executive
management of the Credit Union provided to the Network Engineering department.
Many of the Network Engineering department’s activities required the efforts of all of the
Information Systems groups as well as most of the business units within the entire
organization. To be successful, these efforts needed to be completed within a very tight
timeframe. This is even though each department and groups had its own priorities and
agendas. Without the backing of upper management, it would have been difficult to get
all of these efforts completed on time as required. The importance that management
placed on these efforts motivated all of the departments to cooperate, and all of the
activities were completed according to plan.
Risk Management Techniques
The Credit Union maintained rigorous scope management during the planning
stages for the project, which reduced project risk by limiting the project scope to meeting
60
only those requirements that are necessarily for the project to initially succeed. Limiting
the scope to only what is needed is a basic tenant of agile development methodologies
since agile development is reiterative by nature.
Not only do agile methodologies deliver a usable project faster, the methodologies
can be used to reduce project risk. This is because the feedback loop is shorted for risk
assessment. In agile projects, risk review happens before the project starts (Marcehnko,
2007).
61
Chapter 11
CONCLUSIONS AND RECOMMENDATIONS
As the Credit moves forward in developing its disaster recovery capabilities, there
are several milestones that should be addressed. First, the Credit Union should continue
to test its disaster recovery capabilities by operating out of the disaster recovery facility
for an extended period of time. This should be accomplished through the Credit Union’s
plan to use a backup from the primary host on its secondary host to operate using the
secondary host for a period of one week. This will provide confidence that Credit Union
can recovery with a recovery point objective of up to 24 hours and a recovery time
objective of six to twelve hours.
Next, the Credit Union should continue testing and refining failover tests using
the Recovery Failover software that is currently installed and likely working on the
current systems. This will not only improve the Recovery Point Objective and Recovery
Time Objectives, it will give the Credit Union confidence in executing the Disaster
Recovery using Remote poster during a real disaster recovery event. Once this has been
tested and the processes validate, the next step would be develop redundant third party
VPN solutions at the secondary site.
Developing redundant third party VPN solutions is a project that would be led by
the Network Engineering team. Currently, the Credit Union maintains VPN connections
at its headquarters site for peer connections with services such as the VISA Corporation
VPN, the Credit Union’s Cooperative VPN, the Federal Reserve VPN, Jack Henry &
Associates VPN, as well as the VPN connections that support the Banking Secrecy Act
62
services. These are all third party connections that currently require access to the core
host system and some of them are mission critical to the operation of the Credit Union’s
core business, and an outage during a disaster recovery event would impact the Credit
Union member’s ability to access their financial resources. This is especially true for
ATM, VISA, and the Federal Reserve VPN connections. In order to mitigate this
exposure, the Credit Union should develop and test the ability to terminate these VPNs to
the Secondary site.
Once this is done the Credit Union’s most essential services will be protected
from a disaster that impacts the headquarters site, but other services that be considered
for protection included services such as the call center, online services, and other member
communications systems such as Internet mail. Further analysis of which services should
be protected would need to be analyzed for possible protection in the future.
63
APPENDICES
64
APPENDIX A
Primary Site Core Switch Configuration Fragments
hostname SW-HQ-CORE-01
!
aaa new-model
!
switch 1 provision ws-c3750g-48ts
switch 2 provision ws-c3750g-48ts
switch 3 provision ws-c3750g-48ts
switch 4 provision ws-c3750g-48ts
switch 5 provision ws-c3750g-48ts
switch 6 provision ws-c3750g-48ts
!
system mtu routing 1500
vtp mode transparent
authentication mac-move permit
ip subnet-zero
ip routing
no ip domain-lookup
ip domain-name sfcu.org
!
ip vrf INT-PROD-DR1
rd 10.9.1.3:1
route-target export 143.9.1.3:1
route-target import 143.9.1.3:1
!
ip vrf INT-PROD-HQ1
rd 143.16.1.3:1
route-target export 143.16.1.3:1
route-target import 143.16.1.3:1
!
ip vrf INT-TRAIN-RR1
rd 143.16.1.3:3101
route-target export 143.16.1.3:3101
route-target import 143.16.1.3:3101
!
mls qos queue-set output 2 buffers 10 70 10 10
mls qos
!
port-channel load-balance src-dst-ip
!
spanning-tree mode rapid-pvst
spanning-tree etherchannel guard misconfig
spanning-tree extend system-id
spanning-tree vlan 1-3,5-6,8,10,20,128,203-205,3013,3101,3113 priority
4096
spanning-tree vlan 3501,3503,3520 priority 4096
!
vlan internal allocation policy ascending
65
vlan dot1q tag native
!
vlan 2
name HQ-PROD-SWITCH-143.16.2.0
!
vlan 3
name HQ-PROD-SRVR-143.16.3.0
!
vlan 5
name HQ-DEV-SRVR-143.16.5.0
!
vlan 6
name HQ-PROD-VISA-143.16.6.0
!
vlan 8
name HQ-PROD-VMHOST-143.16.8.0
!
vlan 10
name HQ-PROD-PRINT-143.16.10.0
!
vlan 20
name HQ-PROD-TERMS-143.16.20.0
!
vlan 128
name HQ-PROD-WKS-143.16.128.0
!
vlan 203
name BRDG-PROD-SRVR-143.16.203.0
!
vlan 204
name BRDG-PROD-HOST-143.16.4.0
!
vlan 205
name BRDG-DEV-SRVR-143.16.205.0
!
vlan 2203
name BRDG-OV-SRVR-13-10.9.203.0
!
vlan 3004
name RR-DEV-3004-143.16.4.0
!
vlan 3013
name HQ-FENCE-INT-13-143.16.0.0
!
vlan 3101
name HQ-TRAIN-3101-143.16.1.0
!
vlan 3103
name HQ-TRAIN-3103-143.16.3.0
!
vlan 3104
name HQ-TRAIN-3104-143.16.4.0
66
!
vlan 3113
name HQ-FENCE-INT-113-10.113.0.0
!
vlan 3116
name HQ-TRAIN-3116-143.16.16.0
!
vlan 3501
name HQ-STG-DMZ1-10.110.1.0
!
vlan 3503
name HQ-STG-DMZ1-10.110.3.0
!
vlan 3520
name RR-STG-DMZ1-20-10.110.20.0
!
vlan 3601
name HQ-TRAIN-DMZ1-10.110.1.0
!
vlan 3603
name HQ-TRAIN-DMZ3-10.110.3.0
!
ip ssh time-out 60
ip ssh version 2
!
!
interface Vlan1
description VL0001-HQ-PROD-TRANS-143.16.1.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.1.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 1 ip 143.16.1.1
standby 1 preempt delay minimum 60 reload 120 sync 120
standby 1 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan2
description VL0002-HQ-PROD-SWITCH-143.16.2.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.2.21 255.255.255.0 secondary
ip address 143.16.2.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
67
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 2 ip 143.16.2.1
standby 2 preempt delay minimum 60 reload 120 sync 120
standby 2 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan3
description VL0003-HQ-PROD-SRVR-143.16.3.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.3.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 3 ip 143.16.3.1
standby 3 preempt delay minimum 60 reload 120 sync 120
standby 3 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan5
description VL0005-HQ-DEV-SRVR-143.16.5.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.5.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.11
ip directed-broadcast 176
ntp broadcast
standby 5 ip 143.16.5.1
standby 5 preempt delay minimum 60 reload 120 sync 120
standby 5 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan6
description VL0006-HQ-PROD-VISA-143.16.6.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.6.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
68
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 6 ip 143.16.6.1
standby 6 preempt delay minimum 60 reload 120 sync 120
standby 6 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan8
description VL0008-HQ-PROD-VMHOST-143.16.8.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.8.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 8 ip 143.16.8.1
standby 8 preempt delay minimum 60 reload 120 sync 120
standby 8 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan10
description VL0010-RR-PROD-PRINT-143.16.10.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.10.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 10 ip 143.16.10.1
standby 10 preempt delay minimum 60 reload 120 sync 120
standby 10 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan20
description VL0020-RR-PROD-TERMS-143.16.20.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.20.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
69
ip directed-broadcast 176
ntp broadcast
standby 20 ip 143.16.20.1
standby 20 preempt delay minimum 60 reload 120 sync 120
standby 20 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan128
description VL0128-HQ-PROD-WKS-143.16.128.0/20
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.128.3 255.255.240.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 128 ip 143.16.128.1
standby 128 preempt delay minimum 60 reload 120 sync 120
standby 128 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan203
description VL0203-BRDG-PROD-SRVR-143.16.203.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.203.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 103 ip 143.16.203.1
standby 103 preempt delay minimum 60 reload 120 sync 120
standby 103 authentication md5 key-chain hsrp-keys
standby 203 ip 143.16.203.254
standby 203 priority 95
standby 203 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan204
description VL0204-BRDG-PROD-HOST-143.16.4.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.4.251 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
70
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 104 ip 143.16.4.254
standby 104 preempt delay minimum 60 reload 120 sync 120
standby 104 authentication md5 key-chain hsrp-keys
standby 204 ip 143.16.4.253
standby 204 priority 95
standby 204 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan205
description VL0205-BRDG-DEV-SRVR-143.16.205.0/24
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.205.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 105 ip 143.16.205.1
standby 105 preempt delay minimum 60 reload 120 sync 120
standby 105 authentication md5 key-chain hsrp-keys
standby 205 ip 143.16.205.254
standby 205 priority 95
standby 205 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan2203
description VL2203-BRDG-PROD-OV-SRVR-10.9.203.0/24
ip vrf forwarding INT-PROD-DR1
ip address 10.9.203.3 255.255.255.0
ip helper-address global 143.16.3.225
ip helper-address global 143.16.3.226
ip helper-address 143.16.3.225
ip helper-address 143.16.3.226
ip helper-address global 143.16.3.110
ip helper-address 143.16.3.110
ip directed-broadcast 176
ntp broadcast
standby 103 ip 10.9.203.1
standby 103 priority 95
standby 103 authentication md5 key-chain hsrp-keys
standby 203 ip 10.9.203.254
standby 203 preempt delay minimum 60 reload 120 sync 120
standby 203 authentication md5 key-chain hsrp-keys
!
interface Vlan3004
71
description VL3004-HQ-DEV-HOST-143.16.4.0/16
no ip address
shutdown
!
interface Vlan3101
description HQ-TRAIN-3101-143.16.1.0
ip vrf forwarding INT-TRAIN-RR1
ip address 143.16.1.3 255.255.255.0
ip access-group TRAINING_IN_acl in
ip directed-broadcast 176
ntp broadcast
standby 1 ip 143.16.1.1
standby 1 preempt delay minimum 60 reload 120 sync 120
standby 1 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan3103
description RR-TRAIN-3103-143.16.3.0
ip vrf forwarding INT-TRAIN-HQ1
ip address 143.16.3.3 255.255.255.0
ip access-group TRAINING_IN_acl in
ip directed-broadcast 176
ntp broadcast
standby 1 ip 143.16.3.1
standby 1 preempt delay minimum 60 reload 120 sync 120
standby 1 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan3104
description RR-TRAIN-3104-143.16.4.0
ip vrf forwarding INT-TRAIN-RR1
ip address 143.16.4.251 255.255.255.0
ip access-group TRAINING_IN_acl in
ip directed-broadcast 176
ntp broadcast
standby 104 ip 143.16.4.254
standby 104 preempt delay minimum 60 reload 120 sync 120
standby 104 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan3116
description RR-TRAIN-3116-143.16.16.0
ip vrf forwarding INT-TRAIN-RR1
ip address 143.16.16.3 255.255.255.0
ip access-group TRAINING_IN_acl in
ip directed-broadcast 176
ntp broadcast
standby 1 ip 143.16.16.1
standby 1 preempt delay minimum 60 reload 120 sync 120
standby 1 authentication md5 key-chain hsrp-keys
arp timeout 300
!
72
interface Vlan3601
description RR-TRAIN-DMZ1-10.110.1.0
ip vrf forwarding INT-TRAIN-RR1
ip address 10.110.1.3 255.255.255.0
ip access-group TRAINING_IN_acl in
ntp broadcast
standby 1 ip 10.110.1.1
standby 1 preempt delay minimum 60 reload 120 sync
standby 1 authentication md5 key-chain hsrp-keys
standby 2 ip 10.110.1.254
standby 2 preempt delay minimum 60 reload 120 sync
standby 2 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan3603
description RR-TRAIN-DMZ3-10.110.3.0
ip vrf forwarding INT-TRAIN-RR1
ip address 10.110.3.3 255.255.255.0
ip access-group TRAINING_IN_acl in
ntp broadcast
standby 1 ip 10.110.3.1
standby 1 preempt delay minimum 60 reload 120 sync
standby 1 authentication md5 key-chain hsrp-keys
standby 2 ip 10.110.3.254
standby 2 preempt delay minimum 60 reload 120 sync
standby 2 authentication md5 key-chain hsrp-keys
arp timeout 300
!
interface Vlan3610
no ip address
!
router eigrp 100
!
address-family ipv4 vrf INT-PROD-HQ1
redistribute connected
redistribute static
network 143.16.1.0 0.0.0.255
network 143.16.203.0 0.0.0.255
passive-interface Vlan2
passive-interface Vlan3
passive-interface Vlan5
passive-interface Vlan6
passive-interface Vlan8
passive-interface Vlan10
passive-interface Vlan20
passive-interface Vlan128
passive-interface Vlan204
passive-interface Vlan205
autonomous-system 100
exit-address-family
!
address-family ipv4 vrf INT-PROD-DR1
120
120
120
120
73
redistribute connected
network 10.9.1.0 0.0.0.255
network 10.9.203.0 0.0.0.255
autonomous-system 100
exit-address-family
!
address-family ipv4 vrf INT-TRAIN-HQ1
redistribute connected
network 143.16.1.0 0.0.0.255
autonomous-system 100
exit-address-family
!
ip classless
ip route vrf INT-PROD-HQ1 0.0.0.0 0.0.0.0 Null0
74
APPENDIX B
Primary Site WAN Router Configuration Fragments
hostname RTR-HQ-01
!
!
vrf definition Mgmt-intf
!
address-family ipv4
exit-address-family
!
address-family ipv6
exit-address-family
!
aaa new-model
!
!
aaa session-id common
ip source-route
ip vrf INT-PROD-OV1
rd 172.28.28.113:109
route-target export 172.28.28.113:109
route-target import 172.28.28.113:109
!
ip vrf INT-PROD-HQ1
rd 172.28.28.113:113
route-target export 172.28.28.113:113
route-target import 172.28.28.113:113
!
ip flow-cache timeout inactive 30
ip flow-cache timeout active 5
no ip domain lookup
ip domain name schools.corp
!
multilink bundle-name authenticated
!
archive
log config
logging enable
logging size 500
notify syslog contenttype plaintext
hidekeys
!
redundancy
mode none
!
crypto keyring INT-PROD-HQ1-key-tu0
local-address Loopback0
pre-shared-key address 143.0.0.16 key ********************
pre-shared-key address 143.0.0.15 key ********************
75
crypto keyring vrf-INT-PROD-HQ1-key vrf INT-PROD-HQ1
pre-shared-key address 143.100.1.1 key ********************
pre-shared-key address 143.100.1.2 key ********************
pre-shared-key address 143.100.1.3 key ********************
pre-shared-key address 143.100.1.4 key ********************
pre-shared-key address 143.100.1.6 key ********************
pre-shared-key address 143.100.1.7 key ********************
pre-shared-key address 143.100.1.8 key ********************
pre-shared-key address 143.100.1.9 key ********************
pre-shared-key address 143.100.1.10 key ********************
pre-shared-key address 143.100.1.11 key ********************
pre-shared-key address 143.100.1.12 key ********************
pre-shared-key address 143.100.1.14 key ********************
crypto keyring INT-PROD-HQ1-key
pre-shared-key address 143.100.1.1 key ********************
pre-shared-key address 143.100.1.2 key ********************
pre-shared-key address 143.100.1.3 key ********************
pre-shared-key address 143.100.1.4 key ********************
pre-shared-key address 143.100.1.6 key ********************
pre-shared-key address 143.100.1.7 key ********************
pre-shared-key address 143.100.1.8 key ********************
pre-shared-key address 143.100.1.9 key ********************
pre-shared-key address 143.100.1.10 key ********************
pre-shared-key address 143.100.1.11 key ********************
pre-shared-key address 143.100.1.12 key ********************
pre-shared-key address 143.100.1.14 key ********************
!
crypto isakmp policy 20
encr 3des
authentication pre-share
crypto isakmp keepalive 10
crypto isakmp profile wan-vpn-vrf
vrf INT-PROD-HQ1
keyring vrf-INT-PROD-HQ1-key
match identity address 143.100.1.0 255.255.255.240 INT-PROD-HQ1
local-address GigabitEthernet0/0/0
crypto isakmp profile wan-vpn
keyring INT-PROD-HQ1-key
match identity address 143.100.1.0 255.255.255.240
local-address GigabitEthernet0/0/0
crypto isakmp profile wan-vpn-tu0
keyring INT-PROD-HQ1-key-tu0
match identity address 143.0.0.9 255.255.255.255
local-address Loopback0
!
crypto ipsec transform-set wan-vpn esp-3des esp-sha-hmac
crypto ipsec transform-set wan-vpn-transport esp-3des esp-sha-hmac
mode transport
!
crypto ipsec profile wan-vpn-transport-profile
set transform-set wan-vpn-transport
set isakmp-profile wan-vpn
76
!
crypto ipsec profile wan-vpn-transport-profile-tu0
set transform-set wan-vpn-transport
set isakmp-profile wan-vpn-tu0
!
crypto ipsec profile wan-vpn-vrf-transport-profile
set transform-set wan-vpn-transport
set isakmp-profile wan-vpn-vrf
!
!
ip ssh time-out 60
ip ssh authentication-retries 2
!
class-map match-any RealTime
match ip dscp ef
match protocol sip
match protocol rtp
match access-group name Queue_RealTime
match protocol rtsp
class-map match-any High
match dscp af41
match dscp af42
match dscp af43
match protocol dns
match protocol ntp
match protocol snmp
match protocol ldap
match access-group name Queue_High
match protocol telnet
class-map match-any Medium
match dscp af31
match dscp af32
match dscp af33
match access-group name Queue_Medium
class-map match-any Low
match dscp default
match dscp af11
match dscp af12
match dscp af13
match access-group name Queue_Low
class-map match-any C-01-100M
match access-group name Queue_BR01
class-map match-any C-02-100M
match access-group name Queue_BR02
class-map match-any C-03-100M
match access-group name Queue_BR03
class-map match-any C-04-100M
match access-group name Queue_BR04
class-map match-any C-06-T1s
match access-group name Queue_BR06
class-map match-any C-07-T1s
match access-group name Queue_BR07
77
class-map match-any C-08-100M
match access-group name Queue_BR08
class-map match-any C-10-100M
match access-group name Queue_BR10
class-map match-any C-11-100M
match access-group name Queue_BR11
class-map match-any C-12-T1s
match access-group name Queue_BR012
class-map match-any C-14-100M
match access-group name Queue_BR14
class-map match-any C-HQ-1000M
match access-group name Queue_HQ
class-map match-any C-DR-1000M
match access-group name Queue_DR
!
policy-map C-PRIO
class RealTime
priority percent 10
set dscp ef
class High
bandwidth remaining percent 30
set dscp af41
class Medium
bandwidth remaining percent 40
set dscp af31
class Low
bandwidth remaining percent 29
set dscp af21
policy-map RTR-HQ-01-shape
class C-DR-1000M
shape average 800000000
service-policy C-PRIO
class C-HQ-1000M
shape average 800000000
service-policy C-PRIO
class C-01-100M
shape average 100000000
service-policy C-PRIO
class C-02-100M
shape average 100000000
service-policy C-PRIO
class C-03-100M
shape average 100000000
service-policy C-PRIO
class C-04-100M
shape average 100000000
service-policy C-PRIO
class C-06-T1s
shape average 3000000
service-policy C-PRIO
class C-07-T1s
shape average 3000000
78
service-policy C-PRIO
class C-08-100M
shape average 100000000
service-policy C-PRIO
class C-11-100M
shape average 100000000
service-policy C-PRIO
class C-12-T1s
shape average 3000000
service-policy C-PRIO
class C-14-100M
shape average 100000000
service-policy C-PRIO
class class-default
shape average 1000000000
!
pseudowire-class l2tpv3
encapsulation l2tpv3
sequencing both
ip local interface Loopback113
!
interface Tunnel0
description ** To DR xconnect **
bandwidth 1000000
ip address 143.0.16.1 255.255.255.252
qos pre-classify
tunnel source Loopback0
tunnel destination 143.0.0.9
tunnel key 1
tunnel protection ipsec profile wan-vpn-transport-profile-tu0 shared
!
interface Tunnel1
description ** To Branch 01 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.1.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.1
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel2
description ** To Branch 02 **
bandwidth 100000
79
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.2.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.2
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel3
description ** To Branch 03 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.3.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.3
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel4
description ** To Branch 04 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.4.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.4
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel5
no ip address
ip flow ingress
!
80
interface Tunnel6
description ** To Branch 06 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.6.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.6
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel7
description ** To Branch 07 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.7.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.7
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel8
description ** To Branch 08 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.8.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.8
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel9
81
no ip address
ip flow ingress
!
interface Tunnel10
description ** To Branch 10 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 10.120.10.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.10
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel11
description ** To Branch 11 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.11.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.11
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile shared
!
interface Tunnel12
description ** To Branch 12 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.12.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.12
tunnel path-mtu-discovery
82
tunnel protection ipsec profile wan-vpn-transport-profile
!
interface Tunnel13
no ip address
ip flow ingress
!
interface Tunnel14
description ** To Branch 14 **
bandwidth 100000
ip vrf forwarding INT-PROD-HQ1
ip address 10.120.14.1 255.255.255.252
ip directed-broadcast 176
ip mtu 1418
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
ip tcp adjust-mss 1378
delay 5
shutdown
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.14
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile
!
interface Tunnel999
description ** To DR **
bandwidth 1000000
ip vrf forwarding INT-PROD-HQ1
ip address 143.120.13.1 255.255.255.252
ip flow ingress
ip summary-address eigrp 100 143.16.0.0 255.255.0.0
load-interval 30
delay 5
qos pre-classify
tunnel source GigabitEthernet0/0/0
tunnel destination 143.100.1.9
tunnel path-mtu-discovery
tunnel protection ipsec profile wan-vpn-transport-profile
crypto ipsec df-bit clear
!
interface Loopback0
ip address 143.0.0.13 255.255.255.255
!
interface Loopback1
description RTR-RR-WAN-02-Lo1 | Management Loopback
ip vrf forwarding INT-PROD-HQ1
ip address 172.28.28.1 255.255.255.252
!
interface Loopback113
description RTR-RR-WAN-02-Lo113 | psudo-wire loopback vrf
ip address 172.28.28.113 255.255.255.255
!
shared
shared
shared
INT-PROD-HQ1
83
interface GigabitEthernet0/0/0
description ** To SureWestWAN **
mtu 9216
ip address 143.100.1.13 255.255.255.0
no ip proxy-arp
ip nbar protocol-discovery
ip flow ingress
load-interval 30
delay 10
negotiation auto
service-policy output RTR-RR-01-shape
!
interface GigabitEthernet0/0/1
description Link to SW-RR-CORE-01-Gi4/0/9
no ip address
ip flow ingress
negotiation auto
!
interface GigabitEthernet0/0/1.1
encapsulation dot1Q 1 native
ip vrf forwarding INT-PROD-HQ1
ip address 143.16.1.250 255.255.255.0
ip nbar protocol-discovery
ip flow ingress
!
interface GigabitEthernet0/0/2
description SW-RR-CORE-01-Gi1/0/48 L2TPv3 psudo wire to OV
no ip address
ip flow ingress
negotiation auto
xconnect 172.28.28.109 113 encapsulation l2tpv3 pw-class l2tpv3
!
interface GigabitEthernet0/0/3
description Link to SW-RR-SAN-01 Gi1/0/48 | 143.140.2.1
ip vrf forwarding INT-PROD-HQ1
ip address 143.140.1.1 255.255.0.0
ip nbar protocol-discovery
ip flow ingress
negotiation auto
!
router eigrp 100
!
address-family ipv4 vrf INT-PROD-HQ1
redistribute connected
redistribute static route-map redstatic
network 143.16.0 0.0.255.255
network 143.120.0.0 0.0.255.255
network 143.140.0.0 0.0.255.255
autonomous-system 100
exit-address-family
!
address-family ipv4 vrf INT-PROD-DR1
84
autonomous-system 100
exit-address-family
network 143.0.16.0 0.0.0.3
redistribute connected
!
!
ip flow-export source GigabitEthernet0/0/1.1
ip http server
ip http secure-server
ip route 143.0.0.9 255.255.255.255 GigabitEthernet0/0/0 143.100.1.9
!
ip access-list extended Queue_BR01
remark Branch 01 - 100Mbps
permit ip any 172.24.1.0 0.0.0.255
permit ip any 143.1.0.0 0.0.255.255
ip access-list extended Queue_BR02
remark Branch 02 - 100Mbps
permit ip any 172.24.2.0 0.0.0.255
permit ip any 143.2.0.0 0.0.255.255
ip access-list extended Queue_BR03
remark Branch 03 - 100Mbps
permit ip any 172.24.3.0 0.0.0.255
permit ip any 143.3.0.0 0.0.255.255
ip access-list extended Queue_BR04
remark Branch 04 - 100Mbps
permit ip any 172.24.4.0 0.0.0.255
permit ip any 143.4.0.0 0.0.255.255
ip access-list extended Queue_BR06
remark Branch 06 - 3.0Mbps
permit ip any 172.24.6.0 0.0.0.255
permit ip any 143.6.0.0 0.0.255.255
ip access-list extended Queue_BR07
remark Branch 07 - 3.0Mbps
permit ip any 172.24.7.0 0.0.0.255
permit ip any 143.7.0.0 0.0.255.255
ip access-list extended Queue_BR08
remark Branch 08 - 100Mbps
permit ip any 172.24.8.0 0.0.0.255
ip access-list extended Queue_BR10
remark Branch 10 - 100Mbps
permit ip any 172.24.10.0 0.0.0.255
permit ip any 143.10.0.0 0.0.255.255
ip access-list extended Queue_BR11
remark Branch 11 - 100Mbps
permit ip any 172.24.11.0 0.0.0.255
permit ip any 143.11.0.0 0.0.255.255
ip access-list extended Queue_BR012
remark Branch 12 - 3.0Mbps
permit ip any 172.24.12.0 0.0.0.255
permit ip any 143.12.0.0 0.0.255.255
ip access-list extended Queue_BR14
remark Branch 14 - 100Mbps
85
permit ip any 172.24.14.0 0.0.0.255
permit ip any 143.14.0.0 0.0.255.255
ip access-list extended Queue_DR
remark Secondary DR Site - 1000Mbps
permit ip any 172.24.9.0 0.0.0.255
permit ip any 143.15.0.0 0.0.255.255
permit ip any 10.159.0.0 0.0.255.255
ip access-list extended Queue_HQ
remark Headquarter Site - 1000Mbps
permit ip any 172.24.13.0 0.0.0.255
permit ip any 143.16.0.0 0.0.255.255
permit ip any 143.140.0.0 0.0.255.255
ip access-list extended Queue_High
remark ** UDP Services **
permit udp any any eq domain
permit udp any eq domain any
permit udp any any eq ntp
permit udp any eq ntp any
permit udp any any eq snmp
permit udp any eq snmp any
permit udp any any eq snmptrap
permit udp any eq snmptrap any
permit udp any any eq syslog
permit udp any eq syslog any
remark ** TCP ** Services
permit tcp any any eq ftp
permit tcp any eq ftp any
permit tcp any any eq 22
permit tcp any eq 22 any
permit tcp any any eq domain
permit tcp any eq domain any
permit tcp any any eq 389
permit tcp any eq 389 any
permit tcp any any eq 636
permit tcp any eq 636 any
permit tcp any any eq 3389
permit tcp any eq 3389 any
ip access-list extended Queue_Low
permit ip any any
ip access-list extended Queue_Medium
remark ** Deny SAN traffic push down to low queue **
deny
ip 143.140.0.0 0.0.255.255 10.159.0.0 0.0.255.255
deny
ip 143.149.0.0 0.0.255.255 10.150.0.0 0.0.255.255
remark ** Deny NAS traffic push down to low queue **
deny
ip any any
permit ip any any
ip access-list extended Queue_RealTime
remark ***TDM_Over_IP_Device*******************
permit ip host 143.16.1.5 any
remark ***TCP and UDP 5060 5061****************
permit udp any any eq 5060
permit udp any eq 5060 any
86
permit udp any any eq 5061
permit udp any eq 5061 any
permit tcp any any eq 5060
permit tcp any eq 5060 any
permit tcp any any eq 5061
permit tcp any eq 5061 any
route-map redstatic permit 10
match ip address 10
!
control-plane
!
87
BIBLIOGRAPHY
Arduini, F. & Morabito, V. (2010, Mar) “Business continuity and the banking industry”,
Communications of the ACM, 53(3), 121-125
Cabanatuan, M. (1998, Mar) “Risky Construction on Floodplains Experts starting to
question concept of `100-year flood'”
http://www.crcwater.org/issues4/calfloodplain.htmlSan Francisco Chronicle, Published
March 23, 1998 Accessed October 6, 2011
“Core Solutions” (2011) http://www.symitar.com/CoreSolutionsAccessed October 7,
2011
Gray, C.F. and Larson, E.W. (2008) “What is the right project management structure?”In
C.F. Gray & E.W. Larson, Project Management:The Managerial Process (4th ed.), (pp.
69-76). New York: McGraw-Hill/Irwin
Heaton, W. (2000, Jul) “Reducing latency with VLANs
http://www.techrepublic.com/article/reducing-latency-with-vlans/1033460,
TechRepublic, Published July 6, 2000 Accessed October 9, 2011
Lindstrom, J., Samuelsson, S., and Hagerfors, A. (2010) “Business continuity planning
methodology”, Disaster Prevention and Management, 19(2), 243-255
Liu, S., Zhnag, J., Keil, M, and Chen, T. (2010, Jul) “Comparing senior executive and
project manager perceptions of IT project risk: a Chinese Delphi study” Information
Systems Journal 20(4), 319-355
Lewis, M (2008, Mar) “Configuring an L2TPv3 Ethernet Pseudowire”
http://www.networkworld.com/community/node/26272Network World published March
24, 2008 Accessed October 9, 2011
88
Lumpp, T., Schneider, J., and Mueller, M. (2008, Oct-Dec), “For high availability and
disaster recovery to business continuity solutions”, IBM Systems Journal, 47(4), 605 619
Marchenko, A. (2007, Dec) Risk analysis in agile methods
http://agilesoftwaredevelopment.com/blog/artem/risk-analysis-agile-methodsAgile
Software Development Published December 21, 2007 Accessed October 9, 2011
Mattke, T. (2009, Nov) “cisco mpls vrf configuration and demo” Router Jockey
Published November 19, 2009 Accessed October 9, 2011
Milunsky, J. (2009, May) “Significance of Time Boxing”
http://agile.dzone.com/articles/qa-agile-approach-0 Agile Zone Published May 5, 2009
Accessed Oct 9, 2011
Stake, R. (1995). The art of case research. Newbury Park, CA: Sage Publications.
Tammineedi, R.L. (2010, Jan), “Business Continuity Management: A Standards-Based
Approach”, Information Security Journal: A Global Perspective, 19(1), 36-50
Totty, P. (2006, Apr) “Business Continuity Planning” Credit Union Magazine, 72(4), 8083
Yeo, K.T. and Ren, Y. (2008, Sep) “Risk Management Capability Maturity Model for
Complex Product Systems (CoPS) Projects”, Systems Engineering, 12(4), 275 – 294
Yin, R. K. Case Study Research, Design and Methods, 3rd ed. Newbury Park, Sage
Publications, 2002
Download