Understanding High Availability

Understanding High Availability Implementing a Highly Available Network © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-1 Components of High Availability  The objective of high availability is to prevent outages and minimize downtime.  Achieving high availability integrates multiple components: – Redundancy – Technology – People – Processes – Tools  The first two components are relatively easy to integrate.  The last three components are usually where gaps lead to outages. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-2 Redundancy  Redundancy is used to reduce or eliminate the effects of a failure.  Design of redundancy attempts to eliminate single points of failure: – Avoid single causes of failure. – Use geographic diversity and path diversity. – Use dual devices and links. – Use dual WAN providers. – As appropriate, implement dual data centers. – As appropriate, use dual colocations, dual central office facilities, and dual power substations.  Design of redundancy needs to trade off cost versus benefit: – Hours of downtime compared to the costs of redundancy, planning, etc. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-3 Technology  Cisco routing continuity options: – Cisco Nonstop Forwarding (NSF) – Stateful Switchover (SSO) – Catalyst 3750 Series Switches with Cisco StackWise technology – Catalyst 6500 VSS 1440  Techniques for detecting failure and triggering failover: – Monitoring – IP SLAs and object tracking  Other technologies: – Fast-routing convergence © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-4 People  Staff work habits and skills can impact high availability. – Attention to detail. – Reliability and consistency.  Good skills and ongoing technical training are needed: – Lab time working with technology, practical skills, troubleshooting challenging scenarios, etc. – Communication and documentation are important.  Define what other groups expect.  Define why the network is designed the way it is, how it is supposed to work.  If people are not given the time to do the job right, they cut corners: – If the design target is just “adequate,” falling short leads to poor design.  Staff team should align with services. – Owner and experts for each key service application and other components should be identified and included. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-5 Processes  Build repeatable processes. – Document change procedures, failover planning and lab testing, and implementation procedures.  Use labs appropriately. – Lab equipment reflects the production network, failover mechanisms are tested and understood, and new code is validated before deployment.  Use meaningful change controls. – Test all changes before deployment, use good planning with rollback plans, and conduct realistic and thorough risk analysis.  Manage operation changes. – Perform regular capacity management audits, manage Cisco IOS versions, track design compliance as recommended practices change, and develop disaster recovery plans. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-6 Tools  Monitor availability and key statistics for devices and links. – Use performance thresholds, Top N reporting, and trending to spot potential problems. – Monitor packet loss, latency, jitter, and drops.  Good documentation is a powerful tool. – Maintain updated network diagrams. – Have network design write-ups. – Document key addresses, VLANs, and servers. – Tie services to applications, applications to virtual servers, and virtual servers to real server tables. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-7 Resiliency for High Availability High availability is implemented with the following components: Network-level resiliency  Redundant links  Redundant devices System-level resiliency  Integrated hardware resiliency  Redundant power supply  Stackable switches Management and monitoring  Detection of failure Supported features depend on switch family. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-8 Network-Level Resiliency Link redundancy  Redundant links  EtherChannel Fast convergence  Optimized link implementation  Tuning of Layer 2 and routing protocols Power redundancy  External redundant power supply  Uninterruptible power supply Monitoring  SNMP  Syslog  IP SLA  Time synchronization via NTP © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-9 High Availability and Failover Times The overall failover time is the combination of convergence at Layer 1, Layer 2, Layer 3, and higher layer components.  Layer 1 – Link  Layer 2 – STP  Layer 3 – Routing protocol – First Hop Redundancy Protocol (FHRP)  Higher layers – Firewall failover – Server failover Tuning of timers. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-10 Optimal Redundancy  Core and distribution have redundant switches and links.  Access switches have redundant links.  Network bandwidth and capacity can withstand single switch or link failure.  Network bandwidth and capacity support 200–500 ms to converge around most events. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-11 Provide Alternate Paths  With a single path to the core, one failure causes traffic to be dropped.  A redundant link to the core resolves this issue.  Recommend practice: Use a redundant link to the core with a Layer 3 link between distribution switches. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-12 Avoid Too Much Redundancy Too much redundancy can lead to design issues:  Root placement  Number of blocked links  Convergence process  Complex fault resolution  Cost © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-13 Avoid Single Points of Failure      The access layer is a candidate for supervisor redundancy. Layer 2 access layer SSO. Layer 3 access layer SSO and Cisco NSF. Reduces network outage to 1 to 3 seconds. Supported with Cisco Catalyst 4500 and 6500 Series Switches. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-14 Cisco NSF with SSO  The standby RP takes control of the router after a hardware or software fault on the active RP.  SSO allows the standby RP to take immediate control and maintain connectivity protocols.  Cisco NSF continues to forward packets until route convergence is complete. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-15 Routing Protocol Requirements for Cisco NSF  Cisco NSF enhancements to routing protocols are designed to prevent routing flaps.  Adjacencies must not be reset when switchover is complete; otherwise, protocol state is not maintained.  FIB must remain unchanged during switchover. – Current routes are marked as stale during restart. – Routes are refreshed after Cisco NSF convergence is complete. – Transient routing loops or black holes may be introduced if the network topology changes before the FIB is updated.  Switchover must be completed before dead or hold timer expires; otherwise, peers will reset the adjacency and reroute the traffic.  Cisco NSF-capable routers are configured to support Cisco NSF.  Routers that are aware of Cisco NSF know that Cisco NSF-capable router can still forward packets.  Supported with EIGRP, OSPF, BGP, IS-IS.  Supported with Cisco Catalyst 4550 and 6500 Series Switches. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-16 Summary  High availability involves several elements: redundancy, technology, people, processes, and tools.  At the network level, high availability involves making sure that there is always a possible path between two endpoints.  High availability minimizes link and node failures to minimize downtime, by implementing link and node redundancy, providing alternate paths for traffic, and avoiding single points of failure. © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-17 © 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-18

Understanding High Availability

Related documents

Products

Support

Understanding High Availability

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib