Cisco CCNP TSHOOT Simplified Your Complete Guide to Passing the CCNP TSHOOT Exam Paul Browning (LLB Hons) CCNP, MCSE Farai Tafa dual CCIE This study guide and/or material is not sponsored by, endorsed by or affiliated with Cisco Systems, Inc. Cisco®, Cisco Systems®, CCDA™, CCNA™, CCDP™, CCNP™, CCIE™, CCSI™, the Cisco Systems logo and the CCIE logo are trademarks or registered trademarks of Cisco Systems, Inc in the United States and certain other countries. All other trademarks are trademarks of their respective owners. Copyright Notice Copyright © 2011 Paul Browning all rights reserved. No portion of this book may be reproduced mechanically, electronically or by any other means, including photocopying without written permission of the publisher. ISBN: 978-0-9557815-8-2 Published by: Reality Press Ltd. Midsummer Court 314 Midsummer Blvd. Milton Keynes MK9 2UB help@reality-press.com LEGAL NOTICE The advice in this book is designed to help you achieve the standard of Cisco Certified Network Engineer which is Cisco’s foundation internetworking examination. A CCNA is able to carry out basic router and switch installations and troubleshooting. Before you carry out more complex operations it is advisable to seek the advice of experts or Cisco Systems, Inc. The practical scenarios in this book are meant to illustrate a technical point only and should be used on your privately owned equipment only and never on a live network. C I S CO C C N P T S H O OT S I M P L I F I E D INTRODUCTION Firstly, we want to say congratulations for investing in yourself and your future. Actions speak far louder than words and you have already taken a very important step towards your future as a Cisco Voice Engineer. The new CCNP track has been developed based on continuing feedback from Cisco customers who inform Cisco about what skills and abilities they want to see in their engineers. Over the past few years, Cisco exams have become increasingly harder and, of course, your certification expires every three years, so many engineers who have not kept themselves up-to-date have struggled to maintain their certification. The objective for us, as with all of our Cisco Simplified manuals, is to help you do two things. First and foremost, our goal is to equip you with the skills, knowledge, and ability to carry out the dayto-day role of a Cisco network engineer. We don’t want you to be a walking manual, but we do want you to know how to do the stuff that we consider the “bread and butter” jobs a CCNP engineer would need to carry out. Secondly, of course, we want you to pass your Cisco exams. The mistake many Cisco students make is to do whatever it takes to pass the exam. Even if that approach did work, people taking this tack often sell themselves short. The reason is that most job interviews nowadays consist of both a hands-on and a theoretical test. If a student doesn’t have a grasp of how the technology works, he or she has no hope of success in the real world. These are the current exams you need to pass in order to become a CCNP: • 642-902 ROUTE—Implementing Cisco IP Routing • 642-813 SWITCH—Implementing Cisco IP Switched Networks • 642-832 TSHOOT—Troubleshooting and Maintaining Cisco IP Networks Each exam features theoretical questions as well as multiple hands-on labs where you could be asked to configure or troubleshoot any of the technologies in the syllabus. In addition, you have only 120 minutes to complete all of the tasks and answer all of the questions. Each chapter is broken down into an overview and then the main theory discussion before moving on to a review section covering the main learning points. Be patient with yourself because there is a lot to learn. If you put about two hours aside every day to study, you should be ready to attempt the exam in approximately 60 days from the day you start. If you take days off or holidays, then of course it will take much longer. iii C I S CO C C N P T S H O OT S I M P L I F I E D Almost every topic is applied to how you would use the knowledge in real life, which is an area you will find missing from almost every other Cisco textbook. We design, install, and troubleshoot Cisco networks on a daily basis and have been doing so for many years. We don’t fill your head full of useless jargon and fluff just to boast about how much we know. Although we do spend a little time teaching, for the most part, we are Cisco consultants out in the field. If you are a member of www.howtonetwork.net, then please use the tools, such as the flash cards, practice exams, and videos, found on the site to help you through the exam. You will find them to be vital study tools. Please use the CCNP TSHOOT discussion forum if you have any questions you need help with. Farai and I monitor the forum on a daily basis. Lastly, make sure you register your book at the link below in order to receive free updates on it for life. http://www.howtonetwork.net/public/2653.cfm Best of luck with your studies. See you at the top. Paul Browning Farai Tafa iv C I S CO C C N P T S H O OT S I M P L I F I E D ABOUT THE AUTHORS Paul Browning Paul Browning is the author of CCNA Simplified, which is one of the industry’s leading CCNA study guides. Paul previously worked for Cisco TAC but left in 2002 to start his own Cisco training company in the UK. Paul has taught over 2,000 Cisco engineers with both his classroom-based courses and his online Cisco training site, www. howtonetwork.net. Paul lives in the UK with his wife and daughter. Farai Tafa Farai Tafa is a Dual CCIE in both Routing and Switching and Service Provider. Farai currently works for one of the world’s largest telecoms companies as a network engineer. He has also written workbooks for the CCNA, CCNP, and Cisco Security exams. Farai lives in Washington, D.C. with his wife and daughter. v C I S CO C C N P T S H O OT S I M P L I F I E D TABLE OF CONTENTS PART I: THEORY 1 Chapter 1: Network Monitoring and Maintenance 3 Network Maintenance Fundamentals Overview 4 Network Maintenance Tasks 5 An Overview of Network Management Models 9 IOS Maintenance and Monitoring Tools 15 Additional Maintenance and Monitoring Tools 52 Chapter Summary 56 Chapter 2: Troubleshooting Methodologies and Tools 63 Troubleshooting and the Troubleshooting Flow 64 Communication and Troubleshooting 67 Integrating Maintenance and Troubleshooting 68 Troubleshooting Methodologies 70 The Cisco IOS Generic Troubleshooting Toolkit 73 Additional Troubleshooting Tools 111 Chapter Summary 112 Chapter 3: Troubleshooting Switches at Layers 1 and 2 115 Troubleshooting at the Physical Layer 116 VLAN, VTP, and Trunking Overview 124 Troubleshooting VLANs 126 Using the ‘show vlan’ Command 134 Spanning Tree Protocol Overview 139 Troubleshooting Spanning Tree Protocol 143 Using the ‘show spanning-tree’ Command 155 Chapter Summary 160 Chapter 4: Troubleshooting Catalyst Switch Layer 3 Protocols, Supervisor Redundancy, and Performance Issues 165 Catalyst Switch VLAN Interfaces Overview 166 Catalyst Switch MLS Overview 167 Troubleshooting Multilayer Switching 170 Understanding and Troubleshooting HSRP 171 Understanding and Troubleshooting VRRP 181 Understanding and Troubleshooting GLBP 188 Troubleshooting Switch Supervisor Redundancy 192 Troubleshooting Switch Performance Issues 195 Chapter Summary 199 vii C I S CO C C N P T S H O OT S I M P L I F I E D Chapter 5: Troubleshooting EIGRP 205 Enhanced Interior Gateway Protocol Overview 206 Troubleshooting Neighbor Relationships 212 Troubleshooting Route Installation 217 Troubleshooting Route Advertisement 221 Troubleshooting Stub Routing Issues 225 Troubleshooting SIA Issues 227 Troubleshooting Route Redistribution Issues 229 Debugging EIGRP Routing Issues 230 Chapter Summary 233 Chapter 6: Troubleshooting OSPF 237 Open Short Path First Protocol Overview 238 Troubleshooting Neighbor Relationships 249 Troubleshooting Route Advertisement 259 Troubleshooting Route Redistribution Issues 263 Troubleshooting Route Summarization 266 Debugging OSPF Routing Issues 268 Chapter Summary 270 Chapter 7: Troubleshooting BGP 273 Border Gateway Protocol Overview 274 Troubleshooting Neighbor Relationships 287 Troubleshooting Route Advertisement 292 Troubleshooting Route Redistribution Issues 299 Debugging BGP Routing Issues 301 Chapter Summary 305 Chapter 8: Troubleshooting Cisco IOS Security Features 309 Cisco IOS Security Fundamentals 311 Management Plane Security and Troubleshooting 315 Control Plane Security and Troubleshooting 329 Forwarding Plane Security and Troubleshooting 330 Cisco IOS Firewall Fundamentals 344 Chapter Summary 350 Chapter 9: Troubleshooting Cisco IOS DHCP and NAT 355 Understanding DHCP 356 Troubleshooting DHCP 368 Understanding NAT 373 Troubleshooting NAT 375 Chapter Summary 382 viii C I S CO C C N P T S H O OT S I M P L I F I E D Chapter 10: Troubleshooting IPv6 Routing & Interoperability 385 IP version 6 Protocol Overview and Fundamentals 387 Understanding and Troubleshooting EIGRPv6 399 Understanding and Troubleshooting RIPng 403 Understanding and Troubleshooting OSPFv3 411 Troubleshooting IPv6 Route Redistribution 420 IPv4 and IPv6 Interoperability 421 Troubleshooting IPv4 and IPv6 Interoperability 425 Chapter Summary 429 Chapter 11: Troubleshooting Cisco Wireless LAN Solutions 435 Wireless Local Area Network Overview 436 The Cisco WLAN Solution 441 Troubleshooting Cisco WLAN Solutions 444 Chapter Summary 460 Chapter 12: Troubleshooting Cisco VoIP and Video Solutions 465 Cisco IP Telephony Fundamentals 466 The Need for LAN and WAN Quality of Service 474 LAN and WAN IPT QoS Implementation 475 Cisco IP Video Fundamentals 490 LAN and WAN Video QoS Implementation 507 Troubleshooting Converged Networks 508 Chapter Summary 515 Chapter 13: Troubleshooting Branch Office Solutions 521 Cable and DSL Broadband Access Technologies 522 Site-to-Site VPN Technologies 532 Remote Access VPN Technologies 538 Troubleshooting Broadband Technologies 539 Troubleshooting Site-to-Site VPNs 555 Troubleshooting Remote Access VPNs 564 Chapter Summary 566 PART II: LABS 571 Lab 1: TSHOOT–Multi-Technology Troubleshoot 573 Lab 2: TSHOOT–Multi-Technology Troubleshoot 589 Lab 3: TSHOOT–Multi-Technology Troubleshoot 609 Lab 4: TSHOOT–Multi-Technology Troubleshoot 621 Lab 5: TSHOOT–Multi-Technology Troubleshoot 635 ix PART 1 Theory CHAPTER 1 Network Monitoring and Maintenance C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I W elcome to the CCNP TSHOOT certification exam study guide. The TSHOOT is the third exam required to complete the updated CCNP certification. The objective of this course is to provide you with a solid fundamental understanding of network maintenance and troubleshooting methodology. In addition, this course will also discuss core technology troubleshooting in Cisco IOS software for all CCNP Layer 2 and Layer 3 technologies and protocols. The core TSHOOT exam objectives covered in this chapter are as follows: • • • • Maintain and monitor network performance Develop a plan to monitor and manage a network Perform network monitoring using IOS tools Perform routine Cisco IOS device maintenance As a CCNP network engineer, not only is it important to understand how the relevant protocols and technologies work and are applied or configured in Cisco IOS software, in addition to understanding how to troubleshoot these technologies and protocols, but also it is imperative to understand how to maintain internetworks. Well-maintained networks typically have fewer problems than those that are neglected. Additionally, well-maintained networks are easier to support and troubleshoot, as they often have fewer problems than those that are not well-maintained. This chapter will be divided into the following sections: • • • • • Network Maintenance Fundamentals Overview Network Maintenance Tasks An Overview of Network Management Models IOS Maintenance and Monitoring Tools Additional Maintenance and Monitoring Tools NETWORK MAINTENANCE FUNDAMENTALS OVERVIEW Network maintenance is an integral component of a network management methodology. While in general network maintenance is assumed to be concerned primarily with repairs and upgrades, it should be noted that a comprehensive network maintenance solution also includes corrective and preventive measures, which allow for network optimization as well as the upkeep of network documentation, among other tasks, which are listed later in this chapter. Network maintenance activities can be performed in a structured (scheduled) manner or on an asneeded (ad-hoc) basis. A structured or scheduled approach is based on a predefined plan and is the recommended method for performing any network maintenance tasks. An example of structured maintenance might be a scheduled change window to upgrade the software versions on internetwork devices, such as routers and switches. 4 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E As-needed, or ad-hoc, maintenance activities are those that are performed when any issues arise. These maintenance activities, also referred to as interrupt-driven tasks, are unplanned tasks and they typically cannot be predicted. An example of such a task would be replacing a failed Supervisor module or line card in a Cisco Catalyst 6500 series switch. Such maintenance activities are part of the day-to-day management of the network and cannot be eliminated; however, a structured maintenance approach can be used to mitigate their occurrences and their overall impact on the network, as well as the organization in general. A structured maintenance approach leverages proactive monitoring to detect and remedy any potential problems within the network itself or with internetwork devices. A proactive monitoring solution not only provides the organization with the ability to identify and remedy problems before they actually impact the production environment, but also it allows for capacity planning, which includes network upgrades, expansions, or enhancements. This allows network maintenance activities to be planned, scheduled, and implemented in a controlled manner, greatly increasing the overall probability of the success of those activities. Without a structured maintenance approach in place, the majority of network maintenance tasks are performed in a reactive manner, increasing the number of resources required to maintain and support the network on a day-to-day basis, as well as increasing the likelihood of significant and possible costly business impact (e.g., a network outage) at any given time. NETWORK MAINTENANCE TASKS Network maintenance tasks are those that network administrators perform on a day-to-day basis, allowing for the upkeep of the network. Some of the more common network maintenance tasks include, but are not limited to, the following general activities: • Installing, replacing, or upgrading both hardware and software • Monitoring, tuning, and optimizing the network • Documenting the network and maintaining network documentation • Securing the network from both internal and external threats • Planning for network upgrades, expansions, or enhancements • Scheduling backups and restoring services or the network from backups • Ensuring compliance with legal regulations and corporate policies • Troubleshooting problem reports • Maintaining and updating device configurations 5 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Installing, Replacing, or Upgrading Both Hardware and So ware Hardware and software installation, replacement, and upgrades are very common network maintenance tasks. In a Cisco internetwork, this may include replacing older or failed hardware, such as switch linecards and supervisor modules in Catalyst 4500 and 6500 series switches, for example, as well as upgrading the Cisco IOS images to current revision or patch levels for routers and switches alike. Monitoring, Tuning, and Op mizing the Network One of the core facilitators of an effective network maintenance solution or strategy is proactive monitoring. Proactive monitoring allows potential problems to be detected and remedied before they cause an outage or affect operation. Event logging and network monitoring can be used to react proactively to network or system alerts and can be used to do the following: • Verify the performance of the network and all internetwork devices in the network • Baseline the performance of the network itself • Understand the amount of direction and traffic flows in the network • Identify and troubleshoot potential network issues Documen ng the Network and Maintaining Network Documenta on While most network engineers consider documentation a rather mundane and even lowly task, it is important to understand that documentation is a critical component of network maintenance, as well as troubleshooting and support. It is important to understand that different organizations have different standards for acceptable levels of documentation. Following are several guidelines or recommendations that you should adhere to when documenting the network: • Determine the scope of responsibility • Understand the objective • Maintain documentation consistency • Ensure that the documentation is easily accessible • Maintain the documentation The first guideline is ensuring that you understand your scope of responsibility. That is, it is important to understand what it is for which you are responsible. For example, you may be working in an organization that has a voice, security, storage, and network team all under the Information Technology (IT) department umbrella. Rather than attempting to create documentation for all the teams, you should ensure that you document only those networks and devices that are within your administrative responsibility. 6 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E It is important to understand what the documentation will be used for. A common mistake that is made by network engineers is including either too little or too much information within the documentation. Take time to understand fully what the document you are creating will be used for, and take into consideration the audience the document is targeted to and what information would be useful or excessive for that particular group. Over-documentation makes documented information hard to understand. On the other hand, under-documentation makes network support and troubleshooting difficult to perform. Consistency when creating network documentation is a key component that should be adhered to as much as possible. In most organizations, design and documentation templates are available for reference when creating new documentation. Maintaining consistency increases the usability of those documents and makes them easier to understand for everyone else. No matter how great the documentation is, it helps no one if those who may need the documentation to support troubleshooting or support functions cannot access it. Where possible, documentation should be stored in a location that is readily and easily accessible to all who may use it, such as on a secure network or share location. In some cases, depending on the organization, it may be necessary for documentation to be stored in a secured, offsite location for disaster recovery and business continuity purposes. Finally, once the documentation has been created, it is important to ensure that it is always maintained and up to date. Network diagrams from years ago may contain misleading and incorrect information that may hamper troubleshooting information. Network documentation should be considered living documentation that changes at the same rate as the network. Following the completion of each network project, existing documentation should be updated to reflect the changes that were made to the network. Although there are no standards that determine what information should and should not be included in network documentation, most organizations and businesses have their own standards for what should be included in the network documentation. It is important to adhere to these standards and guidelines when creating documentation. From a best practices perspective, network documentation should include the following information, at a minimum: • Information about the interconnects between devices for LAN and WAN connections • IP addressing and VLAN information • A physical topology diagram of the network • A logical topology diagram of the network • An inventory of all internetwork devices, components, and modules 7 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • A revision control section detailing changes to the topology • Configuration information • Any original or additional design documentation and notes • Data or traffic flow patterns Securing the Network from Both Internal and External Threats Network security is an integral component of network operation and maintenance. It is also very important that consideration be given to both internal and external threats. While most organizations have a dedicated security team, monitoring and a structured maintenance approach can also be used to discover vulnerabilities or potential security threats, which can then allow the appropriate action to be taken before an incident occurs. Planning for Network Upgrades, Expansions, or Enhancements Using network monitoring, you can identify potential issues before they arise, as well as plan for possible network upgrades or expansions (i.e., capacity planning) based on the identified potential issues. Effective planning can be used to define the maintenance tasks required on the network, and then to prioritize those tasks and the order in which they will be implemented. Scheduling Backups and Restoring Services or the Network from Backups Backups are a routine maintenance task that is usually given a very low priority. However, it is important to understand the importance of backups – especially when attempting to recover from a serious or critical failure of the network. Backups should therefore be considered a core common network maintenance task and should be allocated a high priority. It is important to ensure that backups of core network components and devices are scheduled, monitored, and verified at all times. Having up-to-date backups of core devices can assist in faster recovery of the network or individual network components following hardware or software failures, or even data (configuration) loss. Ensuring Compliance with Legal Regula ons and Corporate Policies A structured network maintenance methodology also ensures that the network is compliant with both legal obligations and corporate policies. Regulatory policies, which are mandatory enforcements of compliance with industry regulations and laws, will differ for businesses. Regardless of the industry and the requirements, it is important to ensure that the business is following the industry standards as regulated by law. Unlike legal regulations, corporate policies will vary on a businessby-business basis; however, it is still important to ensure that the network adheres to these policies and can provide the required functions. 8 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E Troubleshoo ng Problem Reports Troubleshooting problem reports is a core network maintenance function. While troubleshooting methodologies are described in detail later in this guide, troubleshooting is simplified by a structured network maintenance approach, which includes documentation, backups, and some form of proactive monitoring system. Maintaining and Upda ng Device Configura ons Configuration changes are common because of the day-to-day moves, additions, or changes (MACs) within organizations. Device configurations may also change due to scheduled maintenance tasks and planned changes to the network. For this reason, maintaining and updating device configurations is considered a core network management function. Each time configurations on devices change, they not only should be documented but also should be saved, both on the device and to an alternate backup location (e.g., an FTP or TFTP server, if one is available). AN OVERVIEW OF NETWORK MANAGEMENT MODELS There are several network management methodologies or models that incorporate network maintenance activities. It is important to understand that these models are guidelines, not standards. Standards can be defined as industry-recognized best practices, frameworks, and agreed-upon principles or concepts and designs, which are designed to implement, achieve, and maintain the required levels of processes and procedures. Guidelines, on the other hand, are simply recommended actions and operational guides for users. Unlike standards, which are mandatory, guidelines are used simply as reference material. Common network management models, which are described in the following section, include the following: • Telecommunications Management Network • FCAPS • Information Technology Infrastructure Library • Cisco Lifecycle Services Telecommunica ons Management Network (TMN) The Telecommunications Management Network (TMN) framework is a model defined by ITU-T for managing open systems in a communications network. It is referenced in ITU-T Recommendation M.3010. It is important to understand that the TMN was developed to provide a framework for service providers to manage their service delivery network; however, the same basic concepts can also be applied to standard enterprise networks. This framework defined four management architectures at different levels of abstraction as follows: 9 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • A functional architecture • An informational architecture • A physical architecture • A logical layered architecture NOTE: While delving into the specifics on these various architectures is beyond the scope of the TSHOOT certification exam, the following section provides a brief description of each. The functional architecture describes various management functions. Next, the informational architecture describes concepts that have been adopted from OSI management. The physical architecture defines how these management functions may be implemented into physical equipment, and, finally, the logical layered architecture includes a model that shows how management can be structured according to different responsibilities. Within the logical layer, the framework also provides a common methodology and logic that is applicable to the management of private enterprise networks by introducing an additional four abstract layers of management functionality, which are as follows: • The Business Management Layer • The Service Management Layer • The Network Management Layer • The Element Management Layer NOTE: Again, while delving into detail on these layers is beyond the scope of the TSHOOT certification exam, the following section provides a brief description of each. The Business Management Layer (BML) has a broad scope, which includes responsibility for the management of the whole enterprise. This layer is more aligned with strategic management, rather than day-to-day operational management. The Service Management Layer (SML) is concerned with management of those aspects that may be directly observed by the users of the telecommunication network. These users may be end users (customers) or other service providers (administrations). Examples of functions that are performed at the Service Management Layer include administration, accounting, the addition and removal of users, and QoS management. The BML and SML provide the link between IT and the business. The Network Management Layer (NML) deals with fault and performance data for the network, as well as overall network management and configuration, which includes tasks such as network 10 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E monitoring and fault detection, optimization, and configuration changes, for example. It is important to understand that this layer pertains to the overall network. Individual device management is covered at the Element Management Layer. The Element Management Layer (EML) deals with configuration management, fault, and performance at the device level. This layer deals with vendor-specific management functions and hides these functions from the layer above, the Network Management Layer. Examples of the functions that are performed at this layer include alarm management, handling of information, backup, logging, maintenance of hardware and software, and measuring resource utilization (e.g., CPU, memory, and power consumption). FCAPS FCAPS is the International Organization for Standardization (ISO) TMN model and framework for network management. The acronym FCAPS stands for Fault management, Configuration management, Accounting management, Performance management, and Security management. These are the network management categories used by the ISO. The following sections describe the five functional areas in the FCAPS model. Fault management is a lifecycle-centered management that revolves around identifying problems through continuous monitoring of the entire network, correlating the fault data, and isolating the problem to the source. The overall fault management lifecycle includes the following tasks: • Fault and problem detection • Handling and acknowledging alarms sent by devices • Fault and problem isolation using a filtration and correlation process • Fault correction and recovery • Tracking problems through resolution via a trouble-ticketing system Configuration management encompasses the management of actual device configurations, the configuration change control process (which may include the commissioning and decommissioning of network devices), backing up and restoring configurations, and overall workflow management for the administrators performing the configuration changes. Another important aspect is the ability to track and log changes to device configurations. Accounting management covers methods to track usage statistics and costs associated with time and services provided with devices and other network resources. Accounting information, such as link utilization and device resource utilization, can also be used for Service Level Agreement (SLA) purposes, ensuring that an agreed-upon level of service is being provided. 11 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Performance management covers the tracking of system and network statistics using a Network Management Station (NMS). The data that is collected may include link utilization, errors, response times, and availability information. This data can then be used to improve performance for critical traffic, such as Voice over IP (VoIP), by implementing or adjusting Quality of Service (QoS) solutions to make the most efficient use of limited bandwidth. Additionally, performance management monitoring can also be used to establish thresholds and identify network trends, all of which provide valuable data for capacity planning. Security management addresses access rights that include authentication and authorization, data privacy, and auditing security violations. From a network administration perspective, security management is concerned primarily with controlling access to network devices using, for example, the Authentication, Authorization, and Accounting (AAA) security architecture. However, security management may also include integrating firewalls and other security devices, such as an Intrusion Prevention System (IPS), into the network to protect against viruses, worms, and other malicious types of traffic. Informa on Technology Infrastructure Library (ITIL) Information Technology Infrastructure Library (ITIL) is a set of best practices for Information Technology Service Management (ITSM), IT development, and IT operations. The names ITIL and IT Infrastructure Library are registered trademarks of the United Kingdom’s Office of Government Commerce (OGC). ITIL provides businesses with a customizable framework of best practices that can be used to ensure and achieve quality service, as well as to overcome some of the difficulties that are associated with the growth of IT systems. ITIL is organized into a set of texts that are defined by related functions. ITILv3, which is the current (latest) version, defines five processes that cover the entire lifecycle of an IT project, from starting its architectural planning, spanning through the design and implementation phases, and covering the operational phases to a continued loop with the services optimization. The five processes or sets defined in ITILv3 are as follows: • Service Strategy • Service Design • Service Transition • Service Operation • Continual Service Improvement Service Strategy is both the center and origin point of the ITIL Service Lifecycle. Service Strategy focuses on the strategic approach to IT Service Management. The purpose of Service Strategy is to devel- 12 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E op the organizational ability to think and act in a strategic manner and transform IT Service Management into a strategic asset. This includes consideration of the services that should be offered, to whom those services should be offered, how service performance will be measured, and how the customer and stakeholders will perceive and measure the value of the services, among other considerations. Service Design is part of the overall business change process and is concerned primarily with the design of new or changed services for introduction into the live environment. This includes their architectures, processes, policies, and documentation. In essence, Service Design simply translates the strategic objectives into portfolios of service and assets. Service Design covers the following lifecycle management aspects: • Service Level Management (SLM) • Service Level Agreements (SLAs) • Operational Level Agreements (OLAs), • Service Improvement Plan • Service Quality Plan • Capacity Management • Availability Management • IT Service Continuity Management • Supplier Management • Compliance Management • IT Architecture Management • Risk Management Service Transition relates to the delivery of services required by a business into operational use. Service Transition includes the following list of processes and activities: • Service Asset and Configuration Management • Service Validation and Testing • Evaluation • Release Management • Change Management • Knowledge Management Service Operation is the stage in the ITIL core lifecycle whose primary purpose is to deliver and support IT services at agreed-upon levels, and to manage the applications, technology, and infrastructure that support the delivery of the services. Service Operation entails the delivery of these services to both users and customers and includes the following processes: 13 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • • • • • Event Management Incident Management Problem Management Request Fulfillment Access Management Continual Service Improvement (CSI) is an activity that is part of everyday life in IT services. It is not an emergency project that is initiated when someone in authority yells that the network service or performance is sub-par or poor. Instead, CSI is an ongoing way of life that entails continually reviewing, analyzing, and improving service management processes and is the service that allows you to address changes in business requirements, refresh technology cycles, and ensure that high quality is maintained. Cisco Lifecycle Services (PPDIOO) The Cisco PPDIOO model encompasses all steps from network vision to optimization, which enables Cisco to provide a broader portfolio of support and end-to-end solutions to its customers. This Cisco lifecycle model includes the stages of prepare, plan, design, implement, operate, and optimize; hence, the acronym PPDIOO. Within the PPDIOO model, the prepare phase deals with network discovery to understand business needs, high-level requirements, and any potential challenges. At the end of this stage, a conceptual architecture of the proposed network solution is then presented. The plan phase of the Cisco PPDIOO model compares the existing network with the proposed network (i.e., the proposed solution in the prepare phase) to help identify tasks, responsibilities, milestones, and resources required to implement the design. The design phase of the Cisco PPDIOO model articulates the detailed design requirements. In this phase, a low-level design, which will eventually be implemented, is designed. Considerations during this phase should also include how to meet requirements for the applications, support, backup, and recovery. The PPDIOO implement phase addresses the integration of new equipment into the existing network environment based on the design requirements. Supporting implementation documentation (e.g., an implementation and back-out plan) is also presented during this phase. The PPDIOO operate phase begins after the new device(s) have been implemented or integrated into the existing network. This phase of the PPDIOO model entails the day-to-day network operation, while responding to any issues that arise. 14 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E Finally, the optimize phase of the PPDIOO model continually gathers the feedback from the operate phase to make potential adjustments to the existing network, which typically results in another project beginning with the prepare phase. The entire process is then repeated again. During the optimize phase, feedback received from the operate phase may also be used to address any ongoing network performance and support issues. IOS MAINTENANCE AND MONITORING TOOLS Cisco provides a plethora of tools that can be used for maintenance and monitoring through the Command Line Interface (CLI), as well as through a Graphical User Interface (GUI) for devices running Cisco IOS software. In addition to these IOS-based tools, Cisco also provides standalone tools that can be used for network maintenance, monitoring, and troubleshooting. The following sections describe some of the Cisco maintenance and monitoring tools that you should be familiar with for the purposes of the TSHOOT certification exam. Enhanced Embedded Event Manager The enhanced Embedded Event Manager (EEM) is part of the Cisco Embedded Automation Systems (EASy) toolkit, which combines the following additional embedded management technologies with EEM: • Cisco IP Service Level Agreements (IP SLAs) • Expression MIB • Network-Based Application Recognition (NBAR) • Flexible NetFlow • Enhanced Object Tracking • Cisco IOS Shell (IOS.sh) NOTE: Cisco IP SLAs will be described in detail later in this chapter. The Cisco Expression MIB is beyond the scope of the TSHOOT certification exam, as is Cisco IOS Shell (IOS.sh). These will not be described in any further detail in this chapter or in the remainder of this guide. NBAR and NetFlow are both core components of the TSHOOT exam and these will be described in detail later in this chapter and throughout this guide. Finally, Enhanced Object Tracking (EOT) is described in detail in both the ROUTE and SWITCH guides. From a troubleshooting perspective, you do not have to know any additional information on EOT. Cisco IOS Embedded Event Manager (EEM) is part of the maintenance and monitoring toolkit. EEM is a powerful and flexible subsystem that provides real-time network event detection and onboard automation. EEM also increases the intelligence of network devices, allowing them to act on and facilitate management actions for specific network events. 15 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I A series of event detector processes designed to monitor explicit operational aspects of routers or switches are built into Cisco IOS Software. These can be primed to look for a specific event, and when that event occurs, they can act as a trigger to start up a user-loaded script. These customizable scripts are programmed using either simple Command Line Interface (CLI) commands or Tool Command Language (Tcl). A common use for EEM is printing a message on the console following a certain action that has been performed on the router. For example, EEM can be configured so that when someone issues on a device the clear counters, clear ip route * , or clear ip bgp * command, for example, a message is printed on the console requesting that the person update relevant network documentation on why this action was taken on the device. In addition, EEM could also be used to print out a message requesting that the network documentation be changed or configurations be saved following changes to a device. These examples are simple examples of the capabilities of EEM, because EEM can be configured to perform actions such as send an e-mail out advising of monitored events. When using the CLI to configure EEM, you must first configure an EEM applet. The EEM applet is a simple form of policy that is defined within the CLI configuration using the event manager applet [name] global configuration command. After you have configured the EEM applet, the router then transitions to EEM applet configuration mode. This configuration mode supports three commands, which are the event, action, and set commands. The event command is used to specify the event criteria that will trigger the EEM applet to run. For example, the event could be a syslog message indicating that counters have been cleared on an interface or the issuing of certain CLI commands, such as clear ip route *. The action command is used to specify an action to perform when the EEM applet has been triggered. Multiple sequential action commands can be configured within the applet. For example, you can specify that the first action that will be taken after a user has exited configuration mode is that a message will be printed on the console, and then the next action can issue a CLI command to save the configuration on the local device or to a TFTP server. Finally, the set command is used to set the value of an EEM applet variable. Following the configuration, the show event manager policy registered command can then be used to display a list of registered applets. The following configuration example shows how to configure a basic EEM applet using the CLI. 16 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E This applet will be triggered when the syslog pattern ‘%SYS-5-CONFIG_I:’ is logged by the router. When triggered by this event, the applet will print a syslog message that reads “Please Update Network Documentation”, followed by another message that reads “Please Save The Configuration”. This simple configuration is implemented on the router as follows: R1(config)#event manager applet CONFIGURATION-CHANGE-APPLET R1(config-applet)#event syslog pattern %SYS-5-CONFIG_I: R1(config-applet)#action 1.0 syslog msg “Please Update Network Documentation” R1(config-applet)#action 1.1 syslog msg “Please Save The Configuration” R1(config-applet)#exit As previously stated in this section, after one or more applets have been configured, the show event manager policy registered command is used to display a list of registered applets. The output of this command following R1’s EEM configuration is illustrated below: R1#show event manager policy registered No. Class Type Event Type Trap Time Registered Name 1 applet user syslog Off Sun Mar 3 02:41:18 2002 CONFIGURATIONCHANGE-APPLET pattern {%SYS-5-CONFIG_I:} action 1.0 syslog msg “Please Update Network Documentation” action 1.1 syslog msg “Please Save The Configuration” As a simple test, this configuration can be validated by entering and exiting configuration mode on the router as illustrated below: R1#configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1(config)# R1(config)# R1(config)#end R1# R1# *Mar 3 02:46:50.493: %SYS-5-CONFIG_I: Configured from console by console *Mar 3 02:46:50.501: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: Please Update Network Documentation *Mar 3 02:46:50.505: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: “Please Save The Configuration” The same EEM applet would also be executed for changes made remotely (e.g., via a Telnet session) as illustrated in the following output: R2#telnet 10.0.0.1 Trying 10.0.0.1 ... Open User Access Verification 17 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Username: netadmin Password: R1>enable Password: R1#terminal monitor R1#config t Enter configuration commands, one per line. End with CNTL/Z. R1(config)# R1(config)# R1(config)#end R1# R1# R1# *Mar 3 02:52:23.643: %SYS-5-CONFIG_I: Configured from console by netadmin on vty0 (10.0.0.2) *Mar 3 02:52:23.651: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: “Please Update Network Documentation” *Mar 3 02:52:23.651: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: “Please Save The Configuration” NOTE: The terminal monitor command must be issued if you want to see log messages on the screen when you remotely access a device. Otherwise, you must use the show logging command to view the messages in the router or switch logs. As a final example, the following configuration output illustrates how to configure multiple EEM applets to log and print messages when the clear counters or clear ip bgp * commands are issued. When the clear counter command is issued, the message “Please advise Network Operations why the interfaces counters were cleared by sending an email to netops@howtonetwork. net. Thank you!” When the command clear ip bgp * is issued, the message “This operation is NOT allowed! Please contact netops@howtonetwork.net for permission to perform this operation. Thank you!” is printed and the command is rejected: R1(config)#event manager applet CLEAR-INTERFACE-COUNTERS-APPLET R1(config-applet)#event cli pattern “clear counters.*” sync no skip no R1(config-applet)#$ sending an email to netops@howtonetwork.net. Thank you!” R1(config-applet)#exit R1(config)#event manager applet CLEAR-IP-BGP-APPLET R1(config-applet)#event cli pattern “clear ip bgp.*” sync no skip yes R1(config-applet)#$ for permission to perform this operation. Thank you!” R1(config-applet)#exit This configuration can be validated using the show event manager policy registered command, as was also illustrated in the previous example. Following is the output of this command after the EEM configuration on R1: 18 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E R1#show event manager policy registered No. Class Type Event Type Trap Time Registered Name 1 applet user cli Off Sun Mar 3 03:16:54 2002 CLEARINTERFACE-COUNTERS-APPLET pattern {clear counters.*} sync no skip no action A syslog msg “Please advise Network Operations why the interface counters were cleared by sending an email to netops@howtonetwork.net. Thank you!” 2 applet user cli Off Sun Mar 3 03:15:53 2002 CLEAR-IP-BGPAPPLET pattern {clear ip bgp.*} sync no skip yes action A syslog msg “This operation is NOT allowed! Please contact netops@ howtonetwork.net for permission to perform this operation. Thank you!” NOTE: The sync keyword is used to determine whether CLI and EEM policy execution will be either synchronous (at the same time) or asynchronous (one after the other). The skip keyword is used to specify whether the command will be executed or run. Going into these advanced options is beyond the scope of the TSHOOT certification exam; however, they have been included in this section to demonstrate further the capabilities of EEM. The EEM configuration on R1 can be tested by issuing the clear counters command on the router and then the clear ip bgp * command as follows: R1#clear counters Clear “show interface” counters on all interfaces [confirm] R1# *Mar 3 03:39:18.317: %HA_EM-6-LOG: CLEAR-INTERFACE-COUNTERS-APPLET: “Please advise Network Operations why the interfaces counters were cleared by sending an email to netops@howtonetwork.net. Thank you!” *Mar 3 03:39:19.191: %CLEAR-5-COUNTERS: Clear counter on all interfaces by netadmin on vty0 (10.0.0.2) R1# R1# R1# R1# R1#clear ip bgp * R1# *Mar 3 03:27:12.235: %HA_EM-6-LOG: CLEAR-IP-BGP-APPLET: “This operation is NOT allowed! Please contact netops@howtonetwork.net for permission to perform this operation. Thank you!” With the execution of the clear ip route * command, the EEM applet prints the stated message and allows the command to be executed. When the clear ip bgp * command is issued, the EEM applet again prints the stated message but this time does not allow the command to be executed. Such configurations ensure that only people authorized to make changes to reset neighbor relationships, etc., are allowed to do so, and do so only when the proper controls and notifications are in order. 19 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I NOTE: You are not required to implement any Cisco IOS EEM configurations in the current TSHOOT exam; however, you should be familiar with the basic EEM configuration logic. Cisco IOS IP Service Level Agreement Cisco IOS IP SLA, which is described in detail in both the ROUTE and SWITCH certification guides, allows you to monitor, analyze, and verify IP service levels for IP applications and services, to increase productivity, to lower operational costs, and to reduce occurrences of network congestion or outages. IP Service Level Agreement (IP SLA) uses active traffic monitoring to measure network performance, allowing IP SLAs to be used not only for maintenance and monitoring functions but also for troubleshooting as well as to baseline network performance. IP SLA can measure and monitor network performance metrics such as jitter, latency (delay), and packet loss. IP SLA has evolved with advanced measurement features, such as application performance, MPLS awareness, and enhanced voice measurements. IP SLA uses active traffic monitoring, which is the generation of traffic in a continuous, reliable, and predictable manner, for measuring network performance edge-to-edge over a network. Given this, IP SLA operations are based on active probes because synthetic network traffic is generated strictly for the purpose of measuring a network performance characteristic of the defined operation. NOTE: A passive probe is one that captures actual network traffic flows for analysis. Examples would be a packet capture (e.g., Ethereal or Wireshark) and NetFlow. IP SLA is comprised of two components, which are the source (agent) and the target. The source, or agent, is where IP SLA operations are defined. In other words, this is where the bulk of the configuration is implemented. Based on the configuration parameters, the source generates packets specific to the defined IP SLA operations, and analyzes the results and records it so that it can be accessed through the CLI or via Simple Network Management Protocol (SNMP). SNMP is described in detail in the SWITCH guide. It will also be described briefly later in this chapter and throughout this guide. A source router can be any Cisco router or switch that can support the IP SLA operation being configured. A particular source or agent can have multiple IP SLA tests running to many remote responders. In addition, a particular router or switch can be both an agent and a responder for different IP SLA configurations. The IP SLA target depends upon the type of IP SLA operation defined and may be a computer or an internetwork device, such as a router or a switch. For example, for IP SLA FTP or HTTP operations, the target would be an FTP or HTTP server. For Routing Table Protocol (RTP) and UDP jitter (VoIP), the target must be a Cisco device. 20 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E If the target is a Cisco device, the ip sla responder global configuration command must be configured on this device because both the source and target participate in the performance measurement. The IP SLA responder has an added benefit of accuracy because it inserts in and out time-stamps in the packet payload and therefore measures the CPU time spent. The IP SLA responder (target) is a Cisco IOS software component that is configured to respond to IP SLA request packets. The IP SLA source establishes a connection with the target using control packets before the configured IP SLA operation begins. Following the acknowledgement of the control packets, the source then sends the responder test packets. The responder inserts a time-stamp when it receives a packet and factors out the destination processing time and adds time-stamps to the sent packets. This allows for the calculation of unidirectional packet loss, latency, and jitter measurements with the kind of accuracy that is not possible using simple ping tests or other dedicated (passive) probe testing. Cisco IOS IP SLA operations can be broadly categorized into the following five functional areas: 1. Availability monitoring 2. Network monitoring 3. Application monitoring 4. Voice monitoring 5. Video monitoring Availability monitoring can be used to monitor network-level availability and is performed primarily using ICMP and UDP packets. IP SLA availability monitoring operations are described in detail in the following section. Network monitoring is used to monitor Layer 2 operations, such as Asynchronous Transfer Mode, Frame Relay, and Multiprotocol Label Switching. Application monitoring is used to monitor common network applications, which include HTTP, FTP, DHCP, and DNS. Voice monitoring is used to determine voice quality scores, Post Dial Delay (PDD), Real Time Protocol (RTP), and gatekeeper registration delay. Video monitoring is used to monitor video traffic. There is no specific IP SLA test for video monitoring; however, the UDP jitter operation can be used to simulate some video traffic. IP SLA operations are configured in global configuration mode. The configuration of the IP SLA feature depends on the software version running on the router. In Cisco IOS software versions 12.3(14)T, 12.4, 12.4(2)T, and 12.2(33)SXH, IP SLA is configured using the ip sla monitor [operation number] global configuration command. In Cisco IOS 12.4(4)T and later, IP SLA is con- figured using the ip sla [operation number] global configuration command. 21 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The [operation number] used in all three variations of IP SLA configuration is an integer between 1 and 2147483647. This allows for the configuration of multiple IP SLA operations on the same device. Following IP SLA configuration in global configuration mode, the router transitions to IP SLA monitor configuration mode. In Cisco IOS software versions 12.3(14)T, 12.4, 12.4(2)T, and 12.2(33)SXH, the IP SLA operation is configured using the type IP SLA monitor configuration command. The type command is used to specify the packet type to send. This may be TCP connect packets or even UDP echo or ICMP echo packets, depending on the operation being configured. Commonly used additional parameters that are specified when configuring Cisco IOS IP SLA operations are timeout and frequency. The timeout keyword is used to specify the amount of time for which the Cisco IOS IP SLAs operation waits for a response from its request packet. For example, when configuring an IP SLA operation that sends ICMP echo packets (pings) to a remote destination, you can use the timeout keyword to specify the amount of time the operation will wait before a response is received and before the operation is considered to be unsuccessful (i.e., fails). The timeout value is specified in milliseconds. The default timeout value varies depending on the type of IP SLA operation you are configuring. The frequency is specified in seconds and is used to specify the rate at which a specified Cisco IOS IP SLA operation is sent into the network. For example, if you specify a frequency of 10 when using the ICMP echo operation, ping packets will be sent every 10 seconds. When configuring the frequency, it is important to understand and remember that the lower the value specified, the greater the overhead on the router or switch sending out the packets. After configuring the IP SLA operation and specifying additional parameters, the operation can then be enabled using the ip sla monitor schedule [operation-number] global configuration command. While this command can be used with several parameters, parameters typically used when configuring IP SLA for use with the reliable static routing backup using object tracking include the life keyword and the start-time keyword. The life keyword is used to specify the length of time to execute the operation. The life can be specified in seconds (up to 2147483647) or infinitely using the forever keyword. The start-time keyword is used to specify when the operation should begin. The most common implementation is to use the now keyword to begin the operation immediately. However, the operation can be configured to start at a specified time, after a specified amount of time, or on a specific date at a specific time, for example. 22 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E NOTE: After configuring and starting the Cisco IOS IP SLA operation(s), the results are stored on the source device in the Cisco RTTMON MIB. This same MIB can also be used to configure IP SLA operations using SNMP set commands. No explicit IP SLA operation configuration is required to begin storing data in the Cisco RTTMOM MIB. Once the IP SLA Monitor has been successfully created and scheduled, you can create IP SLA Performance Reports using tools such as Denika SNMP Performance Trender. Keep in mind that SNMP must be configured on the device. Basic SNMP configuration is described and illustrated later in this chapter. The following configuration example illustrates how to configure IP SLA operations to measure the response time it takes to perform a TCP Connection operation between the router and the remote Web server with the IP address 10.0.0.2. The IP SLA operation timeout value will be set to 5 seconds, and the probe will be run every 10 seconds: R1(config)#ip sla monitor 1 R1(config-sla-monitor)#type tcpConnect dest-ipaddr 10.0.0.2 dest-port 80 R1(config-sla-monitor-tcp)#timeout 5 R1(config-sla-monitor-tcp)#frequency 10 R1(config-sla-monitor-tcp)#exit R1(config)#ip sla monitor schedule 1 start-time now life forever In Cisco IOS versions Cisco IOS 12.4(4)T and later, the same configuration would be implemented on the router as follows: R1(config)#ip sla 1 R1(config-sla-monitor)#tcp-connect 10.0.0.2 80 R1(config-sla-monitor-tcp)#timeout 5 R1(config-sla-monitor-tcp)#frequency 10 R1(config-sla-monitor-tcp)#exit R1(config)#ip sla monitor schedule 1 start-time now life forever As a final example, the following configuration illustrates how to configure a basic IP SLA jitter operation to destination IP address 10.0.0.2 with a destination port of 32768. Finally, the IOS IP SLA operation is scheduled to run every 30 seconds: R1(config)#ip sla monitor 1 R1(config-sla-monitor)#type jitter dest-ipaddr 10.0.0.2 dest-port 32768 R1(config-sla-monitor-jitter)#frequency 30 R1(config-sla-monitor-jitter)#exit R1(config)#ip sla monitor schedule 1 start-time now life forever Following the IP SLA operation configuration, the show ip sla monitor statistics [operation number] command can be used to view the operation’s statistics as follows: R1#show ip sla monitor statistics 1 Round trip time (RTT) Index 1 Latest RTT: 3 ms 23 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Latest operation start time: *03:19:50.110 UTC Sun Mar 3 2002 Latest operation return code: OK RTT Values Number Of RTT: 10 RTT Min/Avg/Max: 3/3/4 ms Latency one-way time milliseconds Number of one-way Samples: 0 Source to Destination one way Min/Avg/Max: 0/0/0 ms Destination to Source one way Min/Avg/Max: 0/0/0 ms Jitter time milliseconds Number of SD Jitter Samples: 9 Number of DS Jitter Samples: 9 Source to Destination Jitter Min/Avg/Max: 0/0/0 ms Destination to Source Jitter Min/Avg/Max: 0/1/1 ms Packet Loss Values Loss Source to Destination: 0 Loss Destination to Source: 0 Out Of Sequence: 0 Tail Drop: 0 Packet Late Arrival: 0 Voice Score Values Calculated Planning Impairment Factor (ICPIF): 0 Mean Opinion Score (MOS): 0 Number of successes: 6 Number of failures: 4 Operation time to live: Forever If the target or destination device is a Cisco IOS router or switch that has been configured with the ip sla monitor responder global configuration command, you can use the show ip sla monitor responder command on that device to view the local IP SLA operation’s statistics: R2#show ip sla monitor responder IP SLA Monitor Responder is: Enabled Number of control message received: 10 Recent sources: 10.0.0.1 [11:07:41.536 UTC Sat 10.0.0.1 [11:07:11.534 UTC Sat 10.0.0.1 [11:06:41.533 UTC Sat 10.0.0.1 [11:06:11.536 UTC Sat 10.0.0.1 [11:05:41.535 UTC Sat Number of errors: 0 Mar Mar Mar Mar Mar 2 2 2 2 2 2002] 2002] 2002] 2002] 2002] Finally, if SNMP is also configured on the local device, the data can then be collected using SNMP and can be used for the creation of reports on network performance. SNMP configuration is described later in this chapter. The primary emphasis in this section is to understand the logic behind, as well as the capabilities of, Cisco IOS IP SLA operations and how they are an integral part of the maintenance and monitoring toolkit. 24 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E Logging Logging messages and events both locally and to a syslog server is a core maintenance task. Syslog is a protocol that allows a host to send event notification messages across IP networks to event message collectors – also known as syslog servers or syslog daemons. In other words, a host or a device can be configured in such a way that it generates a syslog message and forwards it to a specific syslog daemon (server). A syslog daemon or server is an entity that listens to the syslog messages that are sent to it. You cannot configure a syslog daemon to ask a specific device to send it syslog messages. In other words, if a specific device has no ability to generate syslog messages, then a syslog daemon cannot do anything about it. In the real world, corporations typically use SolarWinds (or similar) software for syslog capturing. Additionally, freeware such as the Kiwi Syslog daemon is also available for syslog capturing. Syslog uses User Datagram Protocol (UDP) as the underlying transport mechanism, so the data packets are unsequenced and unacknowledged. While UDP does not have the overhead included in TCP, this means that on a heavily used network, some packets may be dropped and therefore logging information will be lost. However, Cisco IOS software allows administrators to configure multiple syslog servers for redundancy. A syslog solution is comprised of two main elements: a syslog server and a syslog client. The syslog client sends syslog messages to the syslog sever using UDP as the Transport Layer protocol, specifying a destination port of 514. These messages cannot exceed 1,024 bytes in size; however, there is no minimum length. All syslog messages contain three distinct parts: the priority, the header, and the message. The priority of a syslog message represents both the facility and the severity of the message. This number is an 8-bit number. The first 3 least significant bits represent the severity of the message (with 3-bits, you can represent 8 different Severities) and the other 5-bits represent the facility. You can use these values to apply filters on the events in the syslog daemon. NOTE: Keep in mind that these values are generated by the applications on which the event is generated, not by the syslog server itself. The values sent by Cisco IOS devices are listed and described below in Table 1-1: 25 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Table 1-1. Cisco IOS Software Syslog Priority Levels and Definitions Level Level Name Syslog Definition Description 0 Emergencies LOG_EMERG 1 Alerts LOG_ALERT 2 Critical LOG_CRIT 3 Errors LOG_ERR 4 Warnings LOG_WARNING 5 Notifications LOG_NOTICE 6 Informational LOG_INFO 7 Debugging LOG_DEBUG This level is used for the most severe error conditions, which render the system unusable. This level is used to indicate conditions that need immediate attention from administrators. This level is used to indicate critical conditions, which are less than Alerts but still require administrator intervention. This level is used to indicate errors within the system; however, these errors do not render the system unusable. This level is used to indicate warning conditions about system operations that did not complete successfully. This level is used to indicate state changes within the system (e.g., a routing protocol adjacency transitioning to a down state). This level is used to indicate informational messages about the normal operation of the system. This level is used to indicate real-time (debugging) information that is typically used for troubleshooting purposes. In syslog, the facility is used to represent the source that generated the message. This source can be a process on the local device, an application, or even an operating system. Facilities are represented by numbers (integers). In Cisco IOS software, there are eight local use facilities that can be used by processes and applications (as well as the device itself) for sending syslog messages. By default, Cisco IOS devices use facility local7 to send syslog messages. However, it should be noted that most Cisco devices provide options to change the default facility level. In Cisco IOS software, the logging facility [facility] global configuration command can be used to specify the syslog facility. The options available with this command are as follows: R1(config)#logging facility ? auth Authorization system cron Cron/at facility daemon System daemons kern Kernel local0 Local use local1 Local use local2 Local use local3 Local use local4 Local use local5 Local use 26 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E local6 local7 lpr mail news sys10 sys11 sys12 sys13 sys14 sys9 syslog user uucp Local use Local use Line printer system Mail system USENET news System use System use System use System use System use System use Syslog itself User process Unix-to-Unix copy system To send messages via syslog, you must perform the following sequence of steps on the device: 1. Globally enable logging on the router or switch using the logging on configuration command. By default, in Cisco IOS software, logging is enabled; however, it is only enabled to send messages to the console. The logging on command is a mandatory requirement when sending messages to any destination other than the console. 2. Specify the severity of messages to send to the syslog server using the logging trap [severity] global configuration command. You can specify the severity numerically or using the equivalent severity name. 3. Specify one or more syslog server destinations using the logging [address] or logging host [address] global configuration commands. 4. Optionally, specify the source IP address used in syslog messages using the logging sourceinterface [name]. This is a common practice on devices with multiple interfaces config- ured. If this command is not specified, then the syslog message will contain the IP address of the router or switch interface used to reach the server. If there are multiple interfaces for redundancy, this address may change when the primary path (interface) is down. Therefore, it is typically set to a Loopback interface. The following configuration example illustrates how to send all informational (level 6) and below messages to a syslog server with the IP address 192.168.1.254: R2(config)#logging on R2(config)#logging trap informational R2(config)#logging 192.168.1.254 27 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I This configuration can be validated using the show logging command as illustrated below: R2#show logging Syslog logging: enabled (11 messages dropped, 1 messages rate-limited, 0 flushes, 0 overruns, xml disabled, filtering disabled) Console logging: disabled Monitor logging: level debugging, 0 messages logged, xml disabled, filtering disabled Buffer logging: disabled, xml disabled, filtering disabled Logging Exception size (4096 bytes) Count and timestamp logging messages: disabled No active filter modules. Trap logging: level informational, 33 message lines logged Logging to 192.168.1.254(global) (udp port 514, audit disabled, up), 2 message lines logged, xml disabled, filtering disabled link When configuring logging in general, it is important to ensure that the router or switch clocks reflect the actual current time, which allows you to correlate the fault data. Inaccurate or incorrect timestamps on log messages make the fault and problem isolation using a filtration and correlation process very difficult and very time consuming. In Cisco IOS devices, the system clock can be configured manually or the device can be configured to synchronize its clock with a Network Time Protocol (NTP) server. These two options are discussed in the following sections. Manual clock or time configuration is fine if you have only a few internetwork devices in your network. In Cisco IOS software, the system time is configured using the clock set hh:mm:ss [day & month | month & day] [year] privileged EXEC command. It is not configured or specified in global configuration mode. The following configuration example illustrates how to set the system clock to October 20 12:15 am: R2#clock set 12:15:00 20 october 2010 Alternatively, the same configuration could be implemented on the router as follows: R2#clock set 12:15:00 october 20 2010 Following this configuration, the show clock command can be used to view the system time: R2#show clock 12:15:19.419 UTC Wed Oct 20 2010 28 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E One interesting observation of note is that when the system time is configured manually or set using the clock set command, it defaults to the GMT (UTC) time zone, as can be seen above. In order to ensure that the system clock reflects the correct time zone, for those who are not in the GMT time zone, you must use the clock timezone [time zone name] [GMT offset] global configuration command. For example, the United States has six different time zones, each with a different GMT offset. These time zones are Eastern Time, Central Time, Mountain Time, Pacific Time, Hawaii Time, and Alaska Time. In addition, some of the time zones use Standard Time and Daylight Savings Time. Given this, it is important to ensure that the system time is set correctly (Standard or Daylight) on all devices when manually configuring the system clock. The following configuration example illustrates how to set the system clock to 12:40 am on October 20 for the Central Standard Time (CST) time zone, which is six hours behind GMT: R2#config t Enter configuration commands, one per line. R2(config)#clock timezone CST -6 R2(config)#end R2#clock set 12:40:00 october 20 2010 End with CNTL/Z. Following this configuration, the system clock on the local router now shows the following: R2#show clock 12:40:17.921 CST Wed Oct 20 2010 NOTE: If you use the clock set command before the clock timezone command, then the time that you specified using the clock set command will be offset by using the clock timezone command. For example, assume that the configuration commands that are used in the example above were entered on the router as follows: R2#clock set 12:40:00 october 20 2010 R2#config t Enter configuration commands, one per line. R2(config)#clock timezone CST -6 R2(config)#end End with CNTL/Z. Because the clock set command is used first, the output of the show clock command on the router would show the system clock offset by 6 hours, as specified using the clock timezone command. This behavior is illustrated in the following output on the same router: R2#show clock 06:40:52.181 CST Wed Oct 20 2010 29 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I NOTE: Cisco IOS routers and switches can be configured to switch automatically to summer time (Daylight Saving Time) using the clock summer-time zone recurring [week day month hh:mm week day month hh:mm [offset]] global configuration command. This negates the need to have to adjust the system clock manually on all manually configured devices during Standard Time and Daylight Saving Time periods. The second method to setting or synchronizing the system clock is to use a Network Time Protocol (NTP) server as a reference time source. This is the preferred method in larger networks with more than just a few internetwork devices. NTP is a protocol that is designed to time-synchronize a network of machines. NTP is documented in RFC 1305 and runs over UDP. An NTP network usually gets its time from an authoritative time source, such as a radio clock or an atomic clock attached to a time server. NTP then distributes this time across the network. NTP is extremely efficient; no more than one packet per minute is necessary to synchronize two machines to within a millisecond of one another. NTP uses the concept of a stratum to describe how many NTP hops away a machine is from an authoritative time source. Keep in mind that this is not routing or switching hops, but NTP hops, which is a totally different concept. A stratum 1 time server typically has a radio or atomic clock directly attached, while a stratum 2 time server receives its time via NTP from a stratum 1 time server, and so on. When a device is configured with multiple NTP reference servers, it will automatically choose as its time source the machine with the lowest stratum number that it is configured to communicate with via NTP. In Cisco IOS software, a device is configured with the IP addresses of one or more NTP servers using the ntp server [address] global configuration command. As previously stated, multiple NTP reference addresses can be specified by repeatedly using the same command. In addition, this command can also be used to configure security and other features between the server and the client. These features will be described in detail later in this guide. The following configuration example illustrates how to configure a device to synchronize its time with an NTP server with the IP address 10.0.0.1: R2(config)#ntp server 10.0.0.1 Following this configuration, the show ntp associations command can be used to verify the communications between the NTP devices as illustrated in the following output: R2#show ntp associations address ref clock st when poll reach 30 delay offset disp C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E *~10.0.0.1 127.127.7.1 5 44 64 377 3.2 2.39 1.2 * master (synced), # master (unsynced), + selected, - candidate, ~ configured The output of this command provides some very telling information, of which only some will be relevant to the TSHOOT certification exam. First, the address field indicates the IP address of the NTP server as confirmed by the value 10.0.0.1 specified under this field. The ref clock field indicates the reference clock used by that NTP server. In this case, the IP address 127.127.7.1 indicates that the device is using an internal clock (127.0.0.0/8 subnet) as its reference time source. If this field contained another value, such as 192.168.1.254, for example, then that would be the IP address the server was using as its time reference. Next, the st field indicates the stratum of the reference. From the output printed above, we can determine that the 10.0.0.1 NTP device has a stratum of 5. The stratum on the local device will be incremented by 1 to a value of 6, as shown below, because it receives its time source from a server with a stratum value of 5. If another device were synchronized to the local router, it would reflect a stratum of 7 and so forth. The second command that is used to validate the NTP configuration is the show ntp status command, the output of which is illustrated below: R2#show ntp status Clock is synchronized, stratum 6, reference is 10.0.0.1 nominal freq is 249.5901 Hz, actual freq is 249.5900 Hz, precision is 2**18 reference time is C02C38D2.950DA968 (05:53:22.582 UTC Sun Mar 3 2002) clock offset is 4.6267 msec, root delay is 3.16 msec root dispersion is 4.88 msec, peer dispersion is 0.23 msec The output of the show ntp status command indicates that the clock is synchronized to the configured NTP server (10.0.0.1). This server has a stratum of 5, hence the local device reflects a stratum of 6. An interesting observation when NTP is configured is that the local time still defaults to GMT, as can be seen in the bolded section above. To ensure that the device displays the correct time zone, you must issue the clock time-zone command on the device. After the system clock has been set, either manually or via NTP, it is important to ensure that the logs sent to the server contain the correct timestamps. This is performed using the service timestamps log [datetime | uptime] global configuration command. The datetime keyword supports the following self-explanatory additional sub-keywords: R2(config)#service timestamps log datetime ? localtime Use local time zone for timestamps msec Include milliseconds in timestamp show-timezone Add time zone information to timestamp year Include year in timestamp <cr> 31 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The uptime keyword has no additional sub-keywords and configures the local router to include only the system uptime as the timestamp for sent messages. The following configuration example illustrates how to configure the local router to include the local time, millisecond information, and the time zone for all messages: R2#configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2(config)#logging on R2(config)#logging console informational R2(config)#logging host 150.1.1.254 R2(config)#logging trap informational R2(config)#service timestamps log datetime localtime msec show-timezone Following this configuration, the local router console would print the following message: Oct 20 02:14:10.519 CST: %SYS-5-CONFIG_I: Configured from console by console Oct 20 02:14:11.521 CST: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 150.1.1.254 started - CLI initiated In addition, the syslog daemon on server 150.1.1.254 would also reflect the same as illustrated below in the Kiwi Syslog Manager screenshot in Figure 1-1 below: Fig. 1-1. Configuring Log Timestamps Simple Network Management Protocol The Simple Network Management Protocol (SNMP) is a widely used management protocol and defined set of standards for communications with devices connected to an IP network. SNMP provides a means to monitor and control network devices. Like Cisco IOS IP SLA operations, SNMP can be used to collect statistics, to monitor device performance, and to provide a baseline of the network, and is one of the most commonly used network maintenance and monitoring tools. SNMP is an Application Layer (Layer 7) protocol, using UDP ports 161 and 162, that facilitates the exchange of management information between network devices. An SNMP-managed network consists of a management system, agents, and managed devices. The management system executes monitoring applications and controls managed devices. It also executes most of the management processes and provides the bulk of memory resources used for network management. A network 32 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E might be managed by one or more management systems. Examples of SNMP management systems include HP OpenView and SolarWinds. An SNMP agent resides on each managed device and translates local management information data, such as performance information or event and error information caught in software traps, into a readable form for the management system. SNMP agents use get-requests that transport data to the network management software. SNMP agents capture data from Management Information Bases (MIBs), which are device parameter and network data repositories, or from error or change traps. A managed element, such as a router, a switch, a computer, or a firewall, is accessed via the SNMP agent. Managed devices collect and store management information, making it available through SNMP to other management systems having the same protocol compatibility. Figure 1-2 below illustrates the interaction of the three primary components of an SNMP-managed network: Fig. 1-2. SNMP Network Component Interaction Referencing Figure 1-2, R1 is the SNMP-managed device. Logically residing on the device is the SNMP agent. The SNMP agent translates local management information data, stored in the management database of the managed device, into a readable form for the management system, which is also referred to as the Network Management Station (NMS). When using SNMP, managed devices are monitored and controlled using three common SNMP commands: read, write, and trap. The read command is used by an NMS to monitor managed devices. This is performed by the NMS examining different variables that are maintained by managed devices. The write command is used by an NMS to control managed devices. Using this command, the NMS can change the values of variables stored within managed devices. Finally, the SNMP trap command is used by managed devices to report events to the NMS. Devices can be configured to send SNMP traps or informs to an NMS. The traps and informs that are sent are dependent on the version of Cisco IOS software running on the device, as well as the platform. SNMP traps are simply messages that alert the SNMP manager of a condition on the network. An example of an SNMP trap could include an interface transitioning from an up state to a down state. 33 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The primary issue with SNMP traps is that they are unacknowledged. This means that the sending device is incapable of determining whether the trap was received by the NMS. SNMP informs are SNMP traps that include a confirmation of receipt from the SNMP manager. These messages can be used to indicate failed authentication attempts, or the loss of a connection to a neighbor router, for example. If the manager does not receive an inform request, then it does not send a response. If the sender never receives a response, then the inform request can be sent again. Thus, informs are more likely to reach their intended destination. While informs are more reliable than traps, the downside is that they consume more resources on both the router and in the network. Unlike a trap, which is discarded as soon as it is sent, an inform request must be held in memory until a response is received or the request times out. In addition, traps are sent only once, while an inform may be resent several times if a response is not received from the SNMP server (NMS). Figure 1-3 below illustrates the communication between the SNMP manager and the SNMP agent for sending traps and informs: Fig. 1-3. UDP Ports Used by the NMS and Managed Element The three versions of SNMP are versions 1, 2, and 3. Version 1, or SNMPv1, is the initial implementation of the SNMP protocol. SNMPv1 operates over protocols such as User Datagram Protocol (UDP), Internet Protocol (IP), and the OSI Connectionless Network Service (CLNS). SNMPv1 is widely used and is the de facto network-management protocol used within the Internet community. SNMPv2 revises SNMPv1 and includes improvements in the areas of performance, security, confidentiality, and manager-to-manager communications. SNMPv2 also defines two new operations: GetBulk and Inform. The GetBulk operation is used to retrieve large blocks of data efficiently. The Inform operation allows one NMS to send trap information to another NMS and then to receive a response. In SNMPv2, if the agent responding to GetBulk operations cannot provide values for all the variables in a list, then it provides partial results. 34 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E SNMPv3 provides the following three additional security services that are not available in previous versions of SNMP: message integrity, authentication, and encryption. SNMPv3 uses message integrity to ensure that a packet has not been tampered with in-transit. SNMPv3 also utilizes authentication, which is used to determine whether the message is from a valid source. Finally, SNMPv3 provides encryption, which is used to scramble the contents of a packet to prevent it from being seen by unauthorized sources. NOTE: You are not required to go into detail on SNMP versions in the TSHOOT exam. Instead, emphasis should be placed simply on having a basic understanding of the protocol and how it is used as a monitoring and maintenance tool. Additional theoretical and configuration information on SNMP can be found in the current SWITCH guide or online at www.howtonetwork.net. In Cisco IOS software, the snmp-server host [hostname | address] command is used to specify the hostname or IP address of the NMS to which the local device will send traps or informs. To allow the NMS to poll the local device, SNMPv1 and SNMPv2c require that a community string be specified for either read-only or read-write access using the snmp-server community <name> [ro | rw] global configuration command. SNMPv3 does not use the same community-based form of security but instead uses user and group security. The following configuration example illustrates how to configure the local device with two community strings, one for read-only access and the other for read-write access. In addition, the local device is also configured to send SNMP traps for Cisco IOS IP SLA operations and syslog to 1.1.1.1 using the read-only community string: R2#config t Enter configuration commands, one per line. End with CNTL/Z. R2(config)#snmp-server community unsafe RO R2(config)#snmp-server community safe RW R2(config)#snmp-server host 1.1.1.1 traps readonlypassword rtr syslog Figure 1-4 below illustrates a sample report for device resource utilization and availability based on SNMP polling using the ManageEngine OpManager network monitoring software: 35 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Fig. 1-4. Sample SNMP Report on Device Resource Utilization Cisco IOS NetFlow Like SNMP, Cisco IOS NetFlow is a powerful maintenance and monitoring tool that can be used to baseline network performance and assist in troubleshooting. However, there are some significant differences between Cisco IOS NetFlow and SNMP. The first difference is that while SNMP reports primarily on device statistics (e.g., resource utilization, etc.), Cisco IOS NetFlow reports on traffic statistics (e.g., packets and bytes). The second difference between these two tools is that SNMP is a poll-based protocol, meaning that the managed device is polled for information. Cisco IOS NetFlow, however, is a push-based technology, meaning that the device on which NetFlow is configured sends out information that it has collected locally to a central repository. For this reason, NetFlow and SNMP complement each other and should be used together as part of the standard network maintenance and monitoring toolkit. However, they are not replacements for each other; this is often a misunderstood concept and it is important to ensure that you remember this. Another difference is that while SNMP can provide traffic statistics, SNMP cannot differentiate between individual flows. However, Cisco IOS NetFlow can. A flow is simply a series of packets with the same source and destination IP address, source and destination ports, protocol interface, and Class of Service parameters. For IP applications, an IP flow is based on a set of five, and up to seven, IP packet attributes, which may include the following: 1. Destination IP address 2. Source IP address 36 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E 3. Source port 4. Destination port 5. Layer 3 protocol type 6. Class of Service 7. Router or switch interface In addition to these IP attributes, other additional information is also included with a flow. This additional information includes timestamps, which are useful for calculating packets and bytes per second. Timestamps also provide information on the life (duration) of a flow. The flow also includes next-hop IP address information, including BGP routing Autonomous Systems information. Subnet mask information for the flow source and destination addresses is also included, in addition to flags for TCP traffic, which can be used to examine the TCP handshakes. This means that Cisco IOS NetFlow can be used for network traffic accounting, usage-based network billing, network planning, security, Denial of Service (DoS) monitoring capabilities, and network monitoring, in addition to providing information about network users and applications, peak usage times, and traffic routing. All of this makes it a very powerful maintenance, monitoring, and troubleshooting tool. Cisco IOS NetFlow gathers the flow information and stores it in a database called the NetFlow cache or simply the flow cache. Flow information is retained until the flow is terminated or stopped, times out, or the cache is filled. Two methods can be used to access the data stored in the flow: using the CLI (i.e., using show commands) or exporting the data and then viewing it using some type of reporting tool. Figure 1-5 below illustrates NetFlow operation on a Cisco IOS router and how the flow cache is populated: NetFlow-Enabled Router Egress Traffic Ingress Traffic NetFlow Cache • • • • • Inspect IP Packet Attributes • • • • • • • IP source address IP destination address Source port Destination port Layer 3 protocol type Class of Service Router or switch interface Byte info. Address info. Port info. Packet info. ETC., ETC. Flow Cache Fig. 1-5. Basic NetFlow Operation and Flow Cache Population 37 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Referencing Figure 1-5, ingress traffic is received on the local router. This traffic is inspected by the router and IP attribute information is used to create a flow. The flow information is then stored in the flow cache. This information can be viewed using the CLI or can also be exported to an external destination, referred to as a NetFlow Collector, where the same information can then be viewed using an application reporting tool. The following steps are used to implement NetFlow data reporting to the NetFlow Collector: 1. Cisco IOS NetFlow is configured on the device to capture flows to the NetFlow cache. 2. NetFlow export is configured to send flows to the Collector. 3. The NetFlow cache is searched for flows that have been inactive for a certain period of time, have been terminated, or, for active flows, that last greater than the active timer. 4. Those identified flows are exported to the NetFlow Collector server. 5. Approximately 30 to 50 flows are bundled together and are typically transported via UDP. 6. The NetFlow Collector software creates real-time or historical reports from the data. Three primary steps are required when configuring Cisco IOS NetFlow as follows: 1. Configure the interface to capture flows into the NetFlow cache using the ip flow ingress interface configuration command on all interfaces for which you want information to be captured and stored in the flow cache. It is important to remember that NetFlow is configured on a per-interface basis only. The NetFlow information is then stored on the local router and can be viewed using the show ip cache flow command on the local device. In the event that you want to export data to the NetFlow Collector, two additional tasks will be required as follows: 2. Configure the Cisco IOS NetFlow version or format to use via the ip flow-export version [1 | 5 | 9] global configuration command. NetFlow version 1 (v1) is the original format supported in the initial NetFlow releases. This version should be used only when it is the only NetFlow data export format version that is supported by the application that you are using to analyze the exported NetFlow data. Version 5 exports more fields than version 1 does and is the most widely deployed version. Version 9 is the latest Cisco IOS NetFlow version and is the basis of a new IETF standard. Version 9 is a flexible export format version. 3. Configure and specify the IP address of the NetFlow Collector, and then specify the UDP port that the NetFlow Collector will use to receive the UDP export from the Cisco device, 38 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E using the ip flow-export destination [hostname | address] <port> [udp] global configuration command. The [udp] keyword is optional and does not need to be specified when using this command because User Datagram Protocol (UDP) is the default transport protocol used when sending data to the NetFlow Collector. The following example illustrates how to enable NetFlow for a specified router interface: R1#config t Enter configuration commands, one per line. R1(config)#interface Serial0/0 R1(config-if)#ip flow ingress R1(config-if)#end End with CNTL/Z. Following this configuration, the show ip cache flow command can be used to view collected statistics in the flow cache as illustrated in the output below: R1#show ip cache flow IP packet size distribution (721 total packets): 1-32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 .000 .980 .016 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 512 544 576 1024 1536 2048 2560 3072 3584 4096 4608 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 IP Flow Switching Cache, 278544 bytes 4 active, 4092 inactive, 56 added 1195 ager polls, 0 flow alloc failures Active flows timeout in 30 minutes Inactive flows timeout in 15 seconds IP Sub Flow Cache, 21640 bytes 4 active, 1020 inactive, 56 added, 56 added to flow 0 alloc failures, 0 force free 1 chunk, 1 chunk added last clearing of statistics never Protocol Total Flows Packets Bytes Packets Active(Sec) Idle(Sec) -------Flows /Sec /Flow /Pkt /Sec /Flow /Flow TCP-Telnet 2 0.0 34 40 0.0 10.5 15.7 TCP-WWW 2 0.0 9 93 0.0 0.1 1.5 UDP-NTP 1 0.0 1 76 0.0 0.0 15.4 UDP-other 42 0.0 5 59 0.0 0.0 15.7 ICMP 5 0.0 10 64 0.0 0.0 15.1 Total: 52 0.0 7 58 0.0 0.4 15.1 SrcIf Se0/0 Se0/0 Se0/0 Se0/0 SrcIPaddress 150.1.1.254 10.0.0.2 10.0.0.2 10.0.0.2 DstIf Local Local Local Local DstIPaddress 10.0.0.1 1.1.1.1 10.0.0.1 10.0.0.1 39 Pr 01 06 11 11 SrcP 0000 C0B3 07AF 8000 DstP 0800 0017 D0F1 D0F1 Pkts 339 7 1 10 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Se0/0 Se0/0 150.1.1.254 10.0.0.2 Local Local 10.0.0.1 1.1.1.1 01 0000 0800 06 C0B3 0017 271 59 The following example illustrates how to configure and enable NetFlow data collection for the specified router interfaces and then export the data to a NetFlow Collector with the IP address 150.1.1.254 over UDP port 5000 using NetFlow version 5 or the version 5 data format: R1(config)#interface Serial0/0 R1(config-if)#ip flow ingress R1(config-if)#exit R1(config)#interface FastEthernet0/0 R1(config-if)#ip flow ingress R1(config-if)#exit R1(config)#interface Serial0/1 R1(config-if)#exit R1(config)#ip flow-export version 5 R1(config)#ip flow-export destination 150.1.1.254 5000 R1(config)#exit Following this configuration, the collected information can then be viewed using an application reporting tool on the NetFlow Collector. Despite the export of the data, the show ip cache flow command can still be used to view statistics on the local device, which can be a useful tool when troubleshooting network issues or problem reports. Network-Based Applica on Recogni on Network-Based Application Recognition (NBAR) is another Cisco IOS software tool that can be used for monitoring and baselining network performance. NBAR is an intelligent classification engine in Cisco IOS software that can recognize a wide variety of applications. Once the applications are recognized, the network can invoke required services for that particular application by implementing QoS policies to support the application requirements. NBAR provides two primary functions: identifying applications and protocols, and allowing for the dynamic discovery of protocols on the network. Network-Based Application Recognition can classify applications that use statically assigned TCP and UDP port numbers, as well as those that use dynamically assigned or negotiated TCP and UDP port numbers. NBAR can also recognize and classify applications based on non-UDP and non-TCP IP protocols. In addition, NBAR can also perform sub-port classification, including the classification of HTTP URLs, mime, or hostnames. NBAR also supports and can be used for Citrix traffic classification, as well as for Real-Time Transport Protocol (RTP), which is used by IP voice and video. 40 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E The NBAR Protocol Discovery (PD) feature can also be used to collect application and protocol statistics, such as packet counts, byte counts, and bit rates, on a per-interface basis. This information can then be retrieved by polling SNMP statistics from the NBAR PD Management Information Base (MIB). NBAR uses Packet Description Language Modules (PDLMs) for protocol and application recognition. In the event that a specific protocol or application is not recognized, an external PDLM can be loaded at any time into the router Flash memory to extend the NBAR list of recognized protocols. PDLMs can also be used to enhance an existing protocol recognition capability. The use of PDLMs allows NBAR to recognize additional protocols and applications without having to upgrade or replace the current version of software on the router, providing additional flexibility for network administrators. Like NetFlow, NBAR Protocol Discovery is enabled on a per-interface basis using the ip nbar protocol-discovery interface configuration command. Prior to configuring NBAR, you must enable Cisco Express Forwarding (CEF) on the router using the ip cef global configuration command. CEF is described in detail in both the ROUTE and SWITCH guides that are available online. CEF troubleshooting will be described later in this guide. The following configuration example illustrates how to enable NBAR on a router interface. Note that CEF is enabled prior to the NBAR PD configuration: R2(config)#ip cef R2(config)#interface FastEthernet0/0 R2(config-if)#ip nbar protocol-discovery R2(config-if)#exit Following this configuration, NBAR PD statistics can be viewed by issuing the show ip nbar protocol-discovery command. Keep in mind that the recognized applications or protocols printed in the output of this command will vary, depending on your current IOS version and the PDLMs that have been integrated into that version of code or have been loaded into router Flash memory: R2#show ip nbar protocol-discovery FastEthernet0/0 Input ----Protocol Packet Count Byte Count 5min Bit Rate (bps) 5min Max Bit Rate (bps) ------------------------ -----------------------netbios 832 76544 3000 3000 41 Output -----Packet Count Byte Count 5min Bit Rate (bps) 5min Max Bit Rate (bps) -----------------------0 0 0 0 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I snmp icmp eigrp ospf syslog 24 3172 0 1000 222 16428 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 1655 0 0 221 16342 0 0 10 740 0 0 3 270 0 0 2 254 0 0 ... [Truncated Output] When using NBAR, it is important to remember that NBAR PD is based on the standard port numbers for the different applications. For example, NBAR will recognize an application as HTTP if the application is using TCP port 80. Likewise, NBAR will recognize SMTP based on the standard TCP port number of 25. This presents a potential problem in the event that protocols or applications are not using well-known port numbers. For example, it is common practice that some Web applications use TCP port 8080 in addition to the standard port 80. In such a case, you can use the ip nbar port-map [application or protocol] [tcp | udp] [port number 1] [port number 2]…[port number 16] global configuration command to specify up to 16 additional port numbers used by the protocol. The following configuration example illustrates how to configure NBAR to search for a protocol or protocol name using a port number other than the well-known port as follows: • For E-mail (SMTP) traffic, NBAR should search for TCP ports 25 and 2525 • For Telnet traffic, NBAR should search for TCP ports 23 and 3023 • For Web (HTTP) traffic, NBAR should search for TCP ports 80 and 8080 This configuration would be implemented as follows on the local router: 42 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E R2#configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2(config)#ip cef R2(config)#ip nbar port-map smtp tcp 25 2525 R2(config)#ip nbar port-map telnet tcp 23 3023 R2(config)#ip nbar port-map http tcp 80 8080 R2(config)#interface FastEthernet0/0 R2(config-if)#description ‘Connected To Corporate LAN’ R2(config-if)#ip nbar protocol-discovery R2(config-if)#exit Following this configuration, you can then use the show ip nbar port-map [protocol] command to see the ports that NBAR is recognizing for either specific applications or protocols or for all applications and protocols. For example, to see the ports that NBAR recognizes for the Telnet application, you would issue the following command on the router: R2#show ip nbar port-map telnet port-map telnet tcp 23 3023 If the [protocol] is not specified at the end of the command, NBAR shows default as well as customized port information for all supported protocols and applications as follows: R2#show ip nbar port-map port-map bgp port-map bgp port-map citrix port-map citrix port-map cuseeme port-map cuseeme port-map dhcp port-map dns port-map dns udp tcp udp tcp udp tcp udp udp tcp 179 179 1604 1494 7648 7649 24032 7648 7649 67 68 53 53 ... [Truncated Output] As previously stated, NBAR uses PDLMs for protocol and application recognition. In the event that the PDLM file on the router does not recognize a specific protocol or application, you can download a PDLM file for the application (if one exists) from the Cisco Web site. Once the download is complete, copy the file to the router Flash memory and then use the ip nbar pdlm [flash:// filename.pdlm] global configuration command to reference the file. For example, to load a PDLM for Bit Torrent application recognition, you would download the PDLM from the Cisco Web site and then implement the following configuration on the router: 43 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R2#configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2(config)#ip cef R2(config)#ip nbar pdlm flash://bittorrent.pdlm R2(config)#interface FastEthernet0/0 R2(config-if)#description ‘Connected To Corporate LAN’ R2(config-if)#ip nbar protocol-discovery R2(config-if)#exit As previously stated, PDLMs are available for most well-known applications and protocols, but this does not include proprietary applications. If, for example, you have a custom application that uses TCP port numbers 1111, 2222, and 3333, then you can use the ip nbar custom [custom name] [tcp | udp] [port number 1]…[port number x] global configuration command to configure NBAR to classify and monitor the additional static port application as illustrated in the following configuration example: R2(config)#ip cef R2(config)#ip nbar custom h2n_custom_app tcp 1111 2222 3333 R2(config)#interface FastEthernet0/0 R2(config-if)#description ‘Connected To Corporate LAN’ R2(config-if)#ip nbar protocol-discovery R2(config-if)#exit Following your configuration, you can then use the show ip nbar port-map [protocol] command to see the ports that NBAR is recognizing for the specific application as follows: R2#show ip nbar port-map h2n_custom_app port-map h2n_custom_app tcp 1111 2222 3333 Additionally, you can also use the show ip nbar protocol discovery command to view statistics for the custom application as shown in the following output, which has been filtered to include only the previously configured custom application statistics: R2#show ip nbar protocol-discovery | section h2n_custom_app h2n_custom_app 21 21 1358 1134 0 0 0 0 Given the options available with and the flexibility afforded by NBAR, it is easy to see why this is a popular network monitoring tool. However, one significant drawback to NBAR is that it is very resource intensive and consumes a great deal of CPU and memory resources. It is important to ensure that you do not implement NBAR on a router whose resources are already taxed. After NBAR has been enabled, you can use the show ip nbar resources command to view how much memory it is consuming as illustrated in the following output: 44 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E R2#show ip nbar resources NBAR memory usage for tracking Stateful sessions Max-age : 120 secs Initial memory : 1383 KBytes Max initial memory : 4611 KBytes Memory expansion : 68 KBytes Max memory expansion : 68 KBytes Memory in use : 1383 KBytes Max memory allowed : 9223 KBytes Active links : 0 Total links : 20346 You can monitor the CPU utilization on the router using the show processes cpu command. This is a core troubleshooting command that will be described in detail later in this guide. Configura on Management In this next section of Cisco IOS network maintenance and monitoring tools, we will explore some of the tools available in Cisco IOS software for configuration management. As we learned earlier in this chapter, configuration management, including backing up configurations, scheduling backups, and restoring from backups, is a core maintenance task. This section describes the toolkit available in Cisco IOS software to assist with these tasks. One of the most commonly used configuration management commands in Cisco IOS software is the copy running-config command. This command can be used to save the device configuration locally or to a remote destination, such as a Trivial File Transfer Protocol (TFTP) or a File Transfer Protocol (FTP) server. Supported options are illustrated below: R2#copy running-config ? archive: Copy to archive: file system flash: Copy to flash: file system ftp: Copy to ftp: file system http: Copy to http: file system https: Copy to https: file system ips-sdf Update (merge with) IPS signature configuration null: Copy to null: file system nvram: Copy to nvram: file system pram: Copy to pram: file system rcp: Copy to rcp: file system running-config Update (merge with) current system configuration scp: Copy to scp: file system startup-config Copy to startup configuration syslog: Copy to syslog: file system system: Copy to system: file system tftp: Copy to tftp: file system NOTE: The options displayed above will vary depending on the IOS version on the device. 45 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The following configuration example illustrates how to copy the router running configuration to a TFTP server with the IP address 150.1.1.254: R2#copy running-config tftp: Address or name of remote host []? 150.1.1.254 Destination filename [r2-confg]? !! 2732 bytes copied in 2.296 secs (1190 bytes/sec) The same action can also be performed on a single line as follows: R2#copy running-config tftp://150.1.1.254 Address or name of remote host [150.1.1.254]? Destination filename [r2-confg]? !! 2732 bytes copied in 2.288 secs (1194 bytes/sec) Unlike TFTP, file transfers to FTP servers typically require a username and a password, which in turn allows for greater security than that which is provided by TFTP. When copying configuration files to an FTP server requiring a username and password pair for login, you have two options for specifying the username and password pair that the local device will use. The first option is to configure globally the FTP username and password on the device using the ip ftp username <name> and ip ftp password <secret> global configuration commands. Following this, you can then use the copy running-config ftp: command. The example below illustrates how to configure a global FTP username and password pair and how to copy the configuration of the local router to an FTP server with the IP address 150.1.1.254. This example assumes that the FTP server has been appropriately configured: R2(config)#ip ftp username netadmin R2(config)#ip ftp password tshoot R2(config)#end R2#copy running-config ftp: Address or name of remote host []? 150.1.1.254 Destination filename [r2-confg]? Writing r2-confg ! 2780 bytes copied in 4.932 secs (564 bytes/sec) NOTE: Referencing the FTP configuration above, it is important to keep in mind that the FTP password will be stored in plain text on the device until the service password-encryption global configuration command is issued. Following that, the password will be displayed in a hashed format. If the FTP username and password pair is not configured globally on the router, then you can still specify these parameters when using the copy command as follows: 46 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E R2#copy running-config ftp://netadmin:tshoot@150.1.1.254 Address or name of remote host [150.1.1.254]? Destination filename [r2-confg]? Writing r2-confg ! 2738 bytes copied in 7.180 secs (381 bytes/sec) In addition to basic copy commands, Cisco IOS software also supports configuration archive, configuration replace, and configuration rollback tools for configuration management functionality. The configuration archive provides a mechanism to store, organize, and manage an archive of configuration files. This functionality is intended to enhance the configuration rollback capability that is also supported in Cisco IOS software. The configuration archive feature allows you to save configurations in the configuration archive using a standard location and filename prefix that is automatically appended with an incremental version number, and optional timestamp, as each consecutive file is saved. This functionality provides a means for consistent identification of saved configuration files. You can specify how many versions of the running configuration are kept in the archive. After the maximum number of files specified has been reached in the archive, the oldest file will then be deleted automatically when the next, most recent file is saved. The Cisco IOS configuration archive, in which the configuration files are stored, can be located on the following file systems: • If your platform has disk0—disk0:, disk1:, ftp:, pram:, rcp:, slavedisk0:, slavedisk1:, tftp: • If your platform does not have disk0—ftp:, http:, pram:, rcp:, tftp: Implementing the configuration archive feature is a four-step process performed as follows: 1. After entering global configuration mode, issue the archive command to enter archive configuration mode. 2. When in archive configuration mode, next specify the location and filename prefix for the files in the Cisco IOS configuration archive using the path <url> archive configuration mode command. The <url> argument is one of the valid locations specified in the previous section (e.g., tftp:, ftp: disk0:, etc.). The available options will depend on the platform on which this command is implemented. 3. Optionally, specify the maximum number of files to save using the maximum <number> archive configuration command. By default, 10 files will be saved; however, up to 14 files can be saved in the archive. When the specified maximum value has been reached, the oldest file will be overwritten and replaced by the most recent file. An important point to remember is that this command cannot be used or is not supported when backing up the configuration to a network location such as a TFTP or FTP server. 47 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 4. Finally, optionally, specify the time increment for automatically saving an archive file of the current running configuration in the configuration archive using the time-period <minutes> archive configuration command. This command has no default. When configuring the archive feature, the write-memory archive configuration command is typically included in the configuration to allow the router to save the configuration automatically to the specified location each time the running configuration is saved to NVRAM (i.e., the startup configuration, which typically indicates some type of configuration change). The following configuration example illustrates how to configure the local router to back up the configuration to an FTP server, using the specified FTP username and password pair, every week, which is 168 hours or 10,080 minutes. The running configuration file will be saved to the server using the name ‘R2-Archive-Config.’ In addition to the weekly scheduled backup, the router is also configured to archive the configuration every time the running configuration file is saved to NVRAM (i.e., the startup configuration) as illustrated in the following output: R2(config)#ip ftp username netadmin R2(config)#ip ftp password tshoot R2(config)#archive R2(config-archive)#path ftp://150.1.1.254/R2-Archive-Config R2(config-archive)#write-memory R2(config-archive)#time-period 10080 R2(config-archive)#exit Following this configuration, you can use the show archive command to view the archived configuration files. Following is a sample output printed by this command: R2#show archive The next archive file will be named ftp://150.1.1.254/R2-Archive-Config-6 Archive # Name 0 1 ftp://150.1.1.254/R2-Archive-Config-1 2 ftp://150.1.1.254/R2-Archive-Config-2 3 ftp://150.1.1.254/R2-Archive-Config-3 4 ftp://150.1.1.254/R2-Archive-Config-4 5 ftp://150.1.1.254/R2-Archive-Config-5 <- Most Recent 6 7 8 9 10 11 12 13 14 48 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E Because the write-memory archive configuration command has been included in the archive configuration, the local router will save the configuration to the FTP server if either the write memory or the copy running-config startup-config commands are issued: R2#copy running-config startup-config Destination filename [startup-config]? Building configuration... [OK] Writing R2-Archive-Config-1 ! R2#write memory Building configuration... [OK] Writing R2-Archive-Config-2 ! R2# The configuration replace and configuration rollback operations allow you to restore previously archived configurations using the configure replace <target-url> [nolock] [list] [force] [ignorecase] [revert trigger [error | timer <minutes>] | time <minutes>] privi- leged EXEC command. The <target-url> is used to specify the location of the saved configuration file that is to replace the current running configuration. The optional [nolock] keyword is used to disable the locking of the running configuration file. This is used to prevent other users from changing the running configuration during a configuration replace operation. The optional [list] keyword is used to display a list of the command lines applied by the Cisco IOS software parser during each pass of the configuration replace operation. When this keyword is used, the total number of passes performed is also displayed. The [force] keyword is yet another optional keyword that can be used to replace the current running configuration file with the specified saved Cisco IOS configuration file without prompting for confirmation. The [ignorecase] keyword is an optional keyword that is used to instruct the configuration to ignore the case of the configuration confirmation. The [revert trigger [error | timer <minutes>] keywords set the triggers for reverting to the original configuration. If the [error] keyword is included, then the router will revert back to the original configuration if an error is detected. If the timer <minutes>] keywords are included, then the router will revert back to the original configuration file if the specified time period elapses. Finally, the optional time <minutes> keyword can be used to specify the time in which the configure confirm command must be issued to confirm the replacement of the current running configuration file. If the configure confirm command is not issued within the specified time limit, then the configuration replace operation is automatically reversed by the router. 49 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The following example illustrates how to replace the existing running configuration with the archived configuration file named ‘R2-Archive-Config-5’ stored on FTP server 150.1.1.254: R2#configure replace ftp://150.1.1.254/R2-Archive-Config-5 This will apply all necessary additions and deletions to replace the current running configuration with the contents of the specified configuration file, which is assumed to be a complete configuration, not a partial configuration. Enter Y if you are sure you want to proceed. ? [no]: y Loading R2-Archive-Config-5 ! [OK - 2959/4096 bytes] Total number of passes: 0 Rollback Done The following example illustrates how to replace the existing running configuration with the archived configuration file R2-Archive-Config-5 stored on FTP server 150.1.1.254 and specify that the change should be confirmed in 10 minutes, and, if not, the router should reverse this operation automatically: R2#configure replace ftp://150.1.1.254/R2-Archive-Config-5 time 10 Writing R2-Archive-Config-6 !Timed Rollback: Backing up to ftp://150.1.1.254/R2Archive-Config-6 This will apply all necessary additions and deletions to replace the current running configuration with the contents of the specified configuration file, which is assumed to be a complete configuration, not a partial configuration. Enter Y if you are sure you want to proceed. ? [no]: y Loading R2-Archive-Config-5 ! [OK - 2959/4096 bytes] Total number of passes: 0 Rollback Done R2#configure confirm NOTE: Referencing the output above, if the configure confirm command is not issued, then the changes will be reversed in 10 minutes. This option is applicable only when a time for the change confirmation has been specified when using the configure replace command. Because a time limit was not imposed in the first example, this command need not be issued. 50 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E Cisco IOS Command Scheduler The final Cisco IOS maintenance tool that we will discuss in this section is the Cisco IOS Command Scheduler (KRON). The Command Scheduler allows you to run exec commands on a regular basis on a router. For simplicity, consider it as an automation tool for running exec commands on a router at specified or configured intervals. KRON has two processes, which are policy lists and the scheduler. Policy lists contain the exec commands that you want executed on the router. When configuring policy lists, it is important to remember that KRON does not support interactive commands. Therefore, if you want to create a policy list that saves the device configuration, then you should use the write memory command instead of the copy running-config startup-config command, which requires confirmation of this action. This is one of the main limitations of the KRON feature and one of the reasons it is not implemented as much as the other features. Cisco IOS Command Scheduler policy lists are configured using the kron policy-list <name> global configuration command. Following this, in policy list configuration mode, the cli <exec command> KRON policy list configuration command is used to specify the exec command that the configured policy list will run. This command can be used to specify multiple commands that will run at the same time or during the same interval. Following the configuration of the policy list, the next step or task is to configure the KRON occurrences using the kron occurrence <occurrence-name> [user <name>] [in [[days:] hours:]min | at hours:min [[month] day-of-month] [day-of-week] [oneshot | recurring]] global configuration command. Next, within KRON configuration mode, specify the policy list that this schedule applies to using the policy-list <name> Command Scheduler configuration mode command. NOTE: You are not expected to implement any Command Scheduler (KRON) configuration in the current TSHOOT certification exam. However, ensure that you are familiar with basic KRON configuration and functionality. The following configuration example illustrates how to configure a KRON policy that will be used to save the router configuration automatically every day (1440 minutes): R2#configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2(config)#kron policy-list SaveRouterConfiguration R2(config-kron-policy)#cli write memory R2(config-kron-policy)#exit R2(config)#kron occurrence SaveRouterConfigurationSchedule in 1440 recurring 51 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R2(config-kron-occurrence)#policy-list SaveRouterConfiguration R2(config-kron-occurrence)#exit Following the Configuration Scheduler configuration on the router, you can then use the show kron schedule command to display information about the status and schedule of all configured KRON occurrences as illustrated in the following router output: R2#show kron schedule Kron Occurrence Schedule SaveRouterConfigurationSchedule inactive, will run again in 0 days 23:58:23 While the example used in the previous output is a simple one, KRON can be used for other tasks, such as saving device configurations to remote servers (e.g., TFTP servers), making it yet another useful maintenance tool that is available and at your disposal in Cisco IOS software. ADDITIONAL MAINTENANCE AND MONITORING TOOLS In the final section of this chapter, we will discuss some additional maintenance and monitoring tools that you should be familiar with as a network engineer, which include the following: • Cisco Router and Security Device Manager • Cisco Configuration Professional • Cisco Configuration Assistant • Cisco Network Assistant • CiscoWorks LAN Management Solution (LMS) Cisco Router and Security Device Manager Cisco Router and Security Device Manager (SDM) is a Web-based (GUI) device-management tool for Cisco routers that can be used for monitoring and troubleshooting tasks. SDM is supported on a plethora of Cisco IOS routers, ranging from the 800 series routers to the 7300 series routers. SDM can be installed either on the local computer, the router, or on both the computer and the router. Because SDM is a core CCNA topic, we will not be going into any additional detail on this tool in this guide. For additional information on SDM, please refer to the CCNA certification guide that is available online. Once you access the router via SDM, you can then use the Monitor tab to view device statistics, which include, but are not limited to, interface status, logging information, and traffic statistics. Figure 1-6 below illustrates the Monitor tab that is used for monitoring and troubleshooting: 52 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E Fig. 1-6. Cisco Router and Security Device Manager Monitor Tab Cisco Configura on Professional Cisco Configuration Professional (CCP), like Cisco Router and Security Device Manager (SDM), is also a GUI-based device management tool for Cisco access routers. In fact, CCP is the successor of Cisco SDM and is intended to replace the SDM tool. CCP includes the same features available in SDM, and will also include additional capabilities such as Voice over IP (VoIP) support in later versions. CCP, like SDM, can be used for network monitoring and maintenance tasks. While delving into the details pertaining to CCP is beyond the scope of the TSHOOT certification exam, Figure 1-7 below illustrates the CCP Monitoring tab that can be used for device monitoring and troubleshooting. As can be seen in Figure 1-7 below, this tab very closely resembles the Monitor tab shown for SDM: 53 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Fig. 1-7. Cisco Configuration Professional Monitor Tab Cisco Configura on Assistant The Cisco Configuration Assistant (CCA) tool is yet another maintenance, monitoring, and troubleshooting tool that is available from Cisco. CCA is used for the Cisco Smart Business Communications System. CCA is a GUI-based tool that includes multiple wizards, similar to SDM and CCP, which can be used for IP Telephony configuration, router and switch configuration, and security configuration. CCA allows network administrators to view the status of devices and monitor the network from three perspectives: the System Dashboard, Topology View, or Front Panel View. From a troubleshooting perspective, CCA can be used to consolidate system logs into a single archive, which can then be sent to the Technical Assistance Center (TAC) for troubleshooting support. Additionally, CCA also provides tools to perform basic network connectivity troubleshooting functions, such as the Ping and Traceroute utilities. In addition to data network configuration and troubleshooting utilities, CCA also provides tools that can be used for IP Telephony (voice) service diagnostics and troubleshooting. In summation, the capabilities and advantages provided by Cisco Configuration Assistant include the following: • Configuration and deployment support • Network management support • Setup wizards 54 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E • Multiple network views • Simplified network reporting • Drag-and-drop software updates • Troubleshooting NOTE: The configuration of CCA is beyond the scope of the current TSHOOT certification exam and will not be illustrated in this guide. Cisco Network Assistant Cisco Network Assistant (CNA) is a network management and automation tool. Like the other tools described in this section, CNA is a GUI-based tool that can be used to apply common services across Cisco switches, routers, and access points. CNA is available as a free download from the Cisco Web site. At a high level, this tool provides the following capabilities: • Configuration management • Inventory reports • Event notification • Task-based menu • File management • Drag-and-drop Cisco IOS Software upgrades In addition, CNA can also be used for troubleshooting support, as well as for security Catalyst Express 500 series switches. CiscoWorks LAN Management Solu on The CiscoWorks LAN Management Solution (LMS) is comprised of several individual software applications that can be used for the configuration and administration of campus networks. LMS also provides monitoring and troubleshooting capabilities. The following applications are included in the CiscoWorks LAN Management Solution suite: • Resource Manager Essentials (RME) • CiscoWorks Health and Utilization Monitor • Device Fault Manager (DFM) • Internetwork Performance Monitor (IPM) In addition to other capabilities, Resource Manager Essentials (RME) can also be used for network monitoring and fault information for tracking devices that are critical to network uptime. The CiscoWorks Health and Utilization Monitor provides the ability to monitor the device for performance 55 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I parameters and to report violations based on the threshold values configured. This application also has extensive reporting capabilities. The Device Fault Manager (DFM) can be used to monitor device faults in real-time and to determine the root cause by correlating device-level fault conditions, monitoring fault history, and configuring e-mail, SNMP trap, and syslog notifications. Finally, the Internetwork Performance Monitor (IPM) has the proactive ability to troubleshoot network response time, jitter, and availability. As is the case with NAM, LMS is configured using a GUI. Unlike the other tools that have been described in this section, CiscoWorks is the only package that is not freely available for download, as it is used to manage large enterprise networks. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Network Maintenance Fundamentals Overview • Network maintenance is an integral component of a network management methodology • Network maintenance activities are either structured or interrupt-driven (ad-hoc) • A structured or scheduled network maintenance approach is based on predefined plan • Ad-hoc maintenance activities are those that are performed when any issues arise • A structured maintenance approach leverages proactive monitoring • An ad-hoc approach increases the number of resources required to support the network Network Maintenance Tasks • Network maintenance tasks are simply tasks that are performed on a day-to-day basis • The following is a list of common network maintenance tasks: 1. Installing, replacing or upgrading both hardware and software 2. Monitoring, tuning and optimizing the network 3. Documenting the network and maintaining network documentation 4. Securing the network from both internal and external threats 5. Planning for network upgrades, expansions, or enhancements 6. Scheduling backups and restoring services or the network from backups 7. Ensuring compliance with legal regulations and corporate policies 8. Troubleshooting problem reports 9. Maintaining and updating device configurations An Overview of Network Management Models • Network management models are general guidelines running and maintaining a network 56 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E • There are several network management models that are available • You should select the network management model best aligned with your business goals • Commonly referenced network management models include the following: 1. Telecommunications Management Network 2. FCAPS 3. Information Technology Infrastructure Library 4. Cisco Lifecycle Services • The TMN is a model defined by ITU-T for managing systems in a communications network • The TMN is referenced in ITU-T Recommendation M.3010 • The TMN was originally developed to provide a framework for service providers • The TMA-defined four management architectures at different levels of abstraction include the following: 1. A functional architecture 2. An information architecture 3. A physical architecture 4. A logical layered architecture • The TMN logical layered architecture includes an additional four layers of abstraction as follows: 1. The Business Management Layer 2. The Service Management Layer 3. The Network Management Layer 4. The Element Management Layer • FCAPS is the ISO TMN model and framework for network management • The FCAPS fault management lifecycle includes the following tasks: 1. Fault and problem detection 2. Handling and acknowledging alarms sent by devices 3. Fault and problem isolation using a filtration and correlation process 4. Fault correction and recovery 5. Tracking problems through resolution via a trouble ticketing system • Configuration management encompasses the management of actual device configurations • Configuration management encompasses the configuration change control process • Configuration management includes tracking and logging changes to device configurations • Accounting management covers methods to track usage statistics and costs • Performance management covers the tracking of system and network statistics • Performance management includes baselining, and improving performance, e.g. using QoS 57 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Performance management can provide valuable data for capacity planning • Security management addresses access rights that include authentication and authorization • Security management is concerned with securing access to network devices • Security management may also include additional tasks such as integrating firewalls • ITIL is a set of best practices for ITSM, IT development and IT operations • ITIL is organized into a set of texts which are defined by related functions • The five processes or sets defined in ITILv3 are as follows: 1. Service Strategy 2. Service Design 3. Service Transition 4. Service Operation 5. Continual Service Improvement • The Cisco PPDIOO model encompasses all steps from network vision to optimization • PPDIOO stands for prepare, plan, design, implement, operate, and optimize IOS Maintenance and Monitoring Tools • Cisco provides a plethora of tools that can be used for both maintenance and monitoring • The EEM is part of the Cisco Embedded Automation Systems (EASy) toolkit • The EASy toolkit combines the following embedded management technologies with EEM: 1. Cisco IP Service Level Agreements (IP SLAs) 2. Expression MIB 3. Network-Based Application Recognition (NBAR) 4. Flexible NetFlow 5. Enhanced Object Tracking 6. Cisco IOS Shell (IOS.sh) • EEM is a powerful and flexible subsystem that provides real-time network event detection • EEM provides onboard automation and increases the intelligence of network devices • EEM supports of the use of scripts which can be configured using the CLI or using Tcl • IOS IP SLA allows you to monitor, analyze and verify IP service levels for IP applications • IOS IP SLA uses active traffic monitoring to measure network performance • IOS IP SLA measures and monitors performance metrics like jitter, latency, and packet loss • IOS IP SLA is comprised of two components, which are the source (agent) and the target • IOS IP SLA operations can be broadly categorized into the following five functional areas: 1. Availability monitoring 2. Network monitoring 3. Application monitoring 58 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E 4. Voice monitoring 5. Video monitoring • Logging messages and events both locally and to a syslog server is a core maintenance task • Syslog allows a host to send event notification messages across IP networks • Syslog messages are sent to event collectors called syslog servers or syslog daemons • A syslog daemon or server is an entity that listens to the syslog messages that are sent to it • Syslog uses User Datagram Protocol (UDP) as the underlying transport mechanism • The syslog client sends messages to the syslog sever, specifying a destination port of 514 • Syslog messages cannot exceed 1,024 bytes in size • Syslog messages contain three distinct parts, which are the priority, header, and message • When configuring logging, synchronize the device clock manually or using NTP • The Simple Network Management Protocol, SNMP, is a widely used management protocol • SNMP can be used to collect statistics, monitor device performance and for baselining • SNMP is an Application Layer (Layer 7) protocol • SNMP uses UDP as the Transport layer protocol, using UDP ports 161 and 162 • An SNMP network consists of a management system, agents, and managed devices • The management system executes monitoring applications and controls managed devices • An SNMP agent resides on each managed device • SNMP agents capture data from Management Information Bases (MIBs) • A managed element, such as a router, switch, or firewall, is accessed via the SNMP agent • Managed devices are monitored and controlled using read, write and trap commands • The read command is used by an NMS to monitor managed devices • The write command is used by an NMS to control managed devices • SNMP trap command is used by managed devices to report events to the NMS • Devices can be configured to send SNMP traps or informs to an NMS • SNMP traps are messages that alert the SNMP manager of a condition on the network • SNMP informs are SNMP traps that include confirmation of receipt from the manager • There are three versions of SNMP, which are versions 1, 2, and 3 • SNMPv1 is widely used and is the de facto network-management protocol • SNMPv2 revises SNMPv1 and includes improvements to the original SNMPv1 standard • SNMPv3 provides additional security services not available in previous versions • Cisco IOS NetFlow is a powerful maintenance and monitoring tool • Cisco IOS NetFlow reports on traffic statistics, e.g. packets and bytes • The device on which NetFlow is configured sends out information that it has collected • Cisco IOS NetFlow has the ability to differentiate between traffic flows • An IP flow is based on a set of 5, and up to 7, IP packet attributes, which may include the following: 1. Destination IP address 59 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 2. Source IP address 3. Source port 4. Destination port 5. Layer 3 protocol type 6. Class of Service 7. Router or switch interface • Cisco IOS NetFlow stores flow information in the NetFlow cache or simply the flow cache • Collected NetFlow data can be access via the CLI or using a NetFlow Collector • NBAR is an intelligent classification engine in Cisco IOS software • Network Based Application Recognition that can recognize a wide variety of applications • The NBAR Protocol Discovery (PD) feature can collect application and protocol statistics • NBAR uses PDLMs for protocol and application recognition • The use of PDLMs allows NBAR to recognize additional protocols and applications • The configuration archive feature allows configs to be saved in the configuration archive • The configuration replace and configuration rollback allows for configuration rollbacks • The Command Scheduler allows you to run exec commands on a regular basis on a router • The IOS Command Scheduler has 2 processes: policy lists and the scheduler • Policy lists contain the exec commands that you want to be executed on the router • The scheduler is used to configure when these commands will be run Addi onal Maintenance and Monitoring Tools • In addition to IOS tools, Cisco provides the following maintenance and monitoring tools: 1. Cisco Router and Security Device Manager 2. Cisco Configuration Professional 3. Cisco Configuration Assistant 4. Cisco Network Assistant 5. CiscoWorks LAN Management Solution (LMS) • SDM is a Web-based (GUI) device-management tool for Cisco access routers • SDM can be used for monitoring and troubleshooting tasks • CCP is also a GUI based device management tool for Cisco access routers • CCP can be used for network monitoring and maintenance tasks • CCA is used for the Cisco Smart Business Communications System • CCA is a Web-based (GUI) tool • CCA includes the System Dashboard, Topology View, or Front Panel View • CNA is a GUI-based tool • CNA can be used to apply common services across switches, routers, and access points 60 C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E • The CiscoWorks LAN Management Solution is comprised of several software applications • LMS provides monitoring, and troubleshooting capabilities • CiscoWorks LMS can be used for the configuration and administration of campus networks • CiscoWorks LMS includes the following software applications: 1. Resource Manager Essentials (RME) 2. CiscoWorks Health and Utilization Monitor 3. Device Fault Manager (DFM) 4. Internetwork Performance Monitor (IPM) 61 CHAPTER 2 Troubleshoo ng Methodologies and Tools C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I I n the previous chapter, we discussed network maintenance and monitoring methodologies and tasks, as well as the tools that are available in both Cisco IOS software and standalone products that can be used to facilitate network maintenance and monitoring. In this chapter, we will discuss troubleshooting methodologies and processes, including the pros and cons of using these different approaches in any given situation. The TSHOOT certification exam objectives that are covered in this chapter include the following: • Isolate sub-optimal internetwork operation at the correctly defined OSI Model layer • Troubleshoot Multi Protocol system networks Troubleshooting is an integral component of overall network management. Without systematic and structured approaches, troubleshooting can very quickly become a frustrating and time-consuming process. As a network engineer, it is important to understand not only the tools that are available for troubleshooting networks but also the methodologies and best practices that should be applied when troubleshooting different problem reports, with the understanding that there is no one method or tool that can be applied in all situations. For this reason, it is important to have a solid understanding of the different methodologies and tools and how they may be applied to your given network problem. This chapter will be divided in the following sections: • Troubleshooting and the Troubleshooting Flow • Communication and Troubleshooting • Integrating Maintenance and Troubleshooting • Troubleshooting Methodologies • The Cisco IOS Generic Troubleshooting Toolkit • Additional Troubleshooting Tools TROUBLESHOOTING AND THE TROUBLESHOOTING FLOW While there is no single specific definition for the term, in general, troubleshooting can be thought of as the process of identifying or diagnosing a problem. Following diagnosis, a resolution is typically implemented to rectify or correct the problem. While the general idea behind troubleshooting is that it begins after a problem or an issue has been reported (i.e., it is a reactive process), it is important to understand that with effective proactive monitoring, it is possible to identify and resolve potential problems and resolve them before they impact users. As was stated in the introduction, there is no single troubleshooting method that can be applied to all situations; hence, the reason for the different troubleshooting methodologies that will be described in the following section. Despite the different approaches used in these methodologies, 64 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S they all fit into the same basic high-level, three-step troubleshooting flow, which is comprised of the following phases: • The problem report • Problem diagnosis • Problem resolution These three phases in the basic troubleshooting flow process are illustrated in Figure 2-1 below: Problem Report Problem Diagnosis Problem Resolution Fig. 2-1. High-Level Steps in the Troubleshooting Flow The problem report is what typically initiates the troubleshooting process. The objective of this phase is to define the problem. While most users report a problem when it occurs or is experienced, this is not always the case. In some cases, a problem could be reported hours or even days after the fact; it is not uncommon for users to report a problem that has been ongoing for an extended period of time. As an example, a user could call in to the help desk and report that an application has been experiencing intermittent connectivity issues for some time, and while the user previously tolerated it, the problem has now degraded to the point at which the user would like to get it resolved. Following the problem report, the very first step of the troubleshooting process itself is problem diagnosis. The problem diagnosis phase is the most time-consuming phase. This phase is comprised of five general, but integral, steps, all of which are critical to the overall success of the problem resolution effort. The five steps included in the problem diagnosis phase are as follows: 1. Collecting information about the problem 2. Analyzing or examining the collected information 3. Eliminating possible causes 4. Hypothesizing or theorizing potential causes 5. Verifying the hypothesis or theory The first step included in the problem diagnosis phase entails collecting or gathering as much information about the problem as possible. This not only includes collecting information from the person reporting the problem but also includes collecting data from the network itself. The efficiency with which the information is collected, as well as the overall quality of that information, greatly increases the chances of successfully identifying and resolving the problem. 65 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Cisco provides a plethora of tools that can be used to gather or collect information from the network or from internetwork devices. As a network engineer, you should be intimately familiar with these tools (which are described later in this chapter) and their capabilities. While the number of available tools is large, data collection need not be overwhelming, as in most cases, a small subset of the available tools typically can be used to solve most of the problems. Once the pertinent information has been collected, it should be analyzed. Collected information can be analyzed in numerous ways. This may entail using automated tools, referencing the collector’s experience or knowledge about the system or platform in question in order to distinguish between normal and abnormal system behavior, or comparing the information against previously collected baseline information gathered from monitoring tools. When the information has been analyzed, some potential causes of the problem can be eliminated easily. For example, if a user reported intermittent network connectivity issues, and following your information gathering and analysis you observe port flapping on the user machine, then you can potentially rule out an overall network issue should this issue be happening only for that specific user. After eliminating some potential causes, the next step is to hypothesize or theorize about what the potential problem could be. Continuing with the previous intermittent connectivity example, having eliminated the network as a potential cause, the next step would be determining whether the switch port or the patch cable between the user machine and the switch port was bad (among other possible issues), for example. The last step of the problem diagnosis phase is to verify your hypothesis or theory. This step may include running appropriate tests or diagnostics using the relevant tools to validate whether what you think to be the problem is in fact the problem. Alternatively, you could also apply what you think to be the solution and then verify whether that resolves the problem. For example, continuing with the user intermittent connectivity issue, you could replace the patch cable between the user machine and the switch port, or move the user network connection to another switch port, and determine whether that resolves the issue. For simplicity, the steps involved in the problem diagnosis phase are illustrated in Figure 2-2 below: Collect Information Analyze Information Verify Hypothesis Eliminate Causes Hypothesize Causes Fig. 2-2. Steps within the Problem Diagnosis Phase 66 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S While these steps present a structured approach to network troubleshooting, it is important to understand that the process flow may require going back and forth between different steps. Assume, for example, that after testing your proposed solution to the problem, the problem still exists. In such cases, you may need to go back and gather more information and proceed through the entire cycle again. It is important to keep this in mind because following a structured troubleshooting approach does not always guarantee that you will be able to identify the root cause of the reported problem. While there are no guarantees of success, following a structured troubleshooting approach does increase the probability or likelihood of overall success in identifying and then ultimately resolving the issue or problem. The primary objective of the problem diagnosis phase is to reach a hypothesis for the root cause of the problem. Following the completion of the problem diagnosis phase, the troubleshooting process moves to the problem resolution phase. In this final phase, the troubleshooter should notify and then confirm with the user who reported the problem that the problem has indeed been rectified. In addition to notifying the user who initiated the problem report, the troubleshooter also should inform any other invested parties. For example, if the problem had been escalated to management, they should be notified after the problem has been resolved and confirmation of the same has been received. Finally, an additional but important task is to include all relevant documentation and notes that may be used as a reference in the future prior to closing out the trouble report. If changes were made to the network to resolve the problem, then relevant documentation should be updated to include these changes. COMMUNICATION AND TROUBLESHOOTING Effective communication is an integral component of the troubleshooting process. This entails communication with the end user, the team, and management. Effective communication is essential in all steps of the troubleshooting process that was described in the previous section. The following sections describe how effective communication can be used to facilitate the troubleshooting process. During the problem report phase of the troubleshooting process, it is important to communicate effectively with the end user (i.e., the person who is reporting the problem). Effective communication allows you to gather as much information about the problem as possible, which ultimately facilities the troubleshooting process and can reduce the time to repair. Some examples of information to collect include how long the problem has been occurring, whether any changes were made to the end user’s system, and so forth. Should the end user be irate, it is important to empathize with him or her to gain his or her trust and confidence. 67 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Within the problem diagnosis phase, it is important to communicate effectively during all of the steps included in this phase. In addition to collecting information from the end user, it is also important to communicate effectively with other groups, internal or external, from which you will also need to acquire information. Within an organization, this may include the voice and security teams, for example. Outside of the organization, this may include the service provider help desk or network operations team. Effective communication ensures that the correct information needed to troubleshoot the problem is collected. When examining the collected information, it is often necessary to verify the validity of this information by collaborating with team members or even with other groups within the IT department. This is applicable to the steps of eliminating possible causes, as well as verifying the hypothesis. As a network engineer, you should be able to communicate effectively with team members and other groups within the organization. Communication is also critical when testing and verifying the hypothesis. Depending on the nature of the problem, this step may result in temporary network outages or the interruption of other activities the user is performing. For this reason, it is important to ensure that not only the end user but also all other parties that may be potentially affected by this step are advised. The communication should clearly communicate what is to be expected and how long the interruption will last. Once completed, all relevant parties should be notified again. Finally, during the problem resolution phase, it is important to communicate with the end user and verify that the problem has indeed been resolved and all symptoms have disappeared. Additionally, depending on the nature of the problem, the same should be communicated to any other groups or individuals. If a trouble ticket was opened, detailed notes on the problem and the resolution for the problem should be included. This information can be useful when troubleshooting similar problems. INTEGRATING MAINTENANCE AND TROUBLESHOOTING As we learned in the previous chapter, a well-documented and well-maintained network is much easier to support and troubleshoot than one with little or no documentation and that is not regularly maintained. In other words, a structured maintenance approach facilitates or simplifies the troubleshooting and support functions of overall network management. In essence, effective troubleshooting is dependent on the tasks and tools that are part of the structured maintenance process. This section describes the ways maintenance facilitates the troubleshooting process. 68 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S Establishing a Baseline Baselining is a process for studying the network, including devices that constitute the network, at regular intervals. A baseline is more than just a single, randomly run report detailing the health of the network at any given point in time; it is a continual process. Baselining facilitates the troubleshooting process by allowing troubleshooters to differentiate between normal and abnormal behavior when troubleshooting problem reports. By following the baseline process, you can perform the following: • Get information on the health of the hardware and the software • Predict future problems • Determine the current utilization of network resources • Identify current network problems • Make accurate decisions about network alarm thresholds The baselining process can be used to determine the network break point, which is the point at which the network will break. The network break point can be determined through the knowledge of how the hardware and the software in the network perform. This information can be used to identify and properly plan for critical resource limitation issues in the network (i.e., capacity planning). Several tools can be used to establish a baseline, including, but not limited to, Cisco IP SLA operations, SNMP, Cisco IOS NetFlow, and CiscoWorks LMS. Documenta on Documentation, a core maintenance function, is integral to the troubleshooting process. As was stated in Chapter 1, it is important to ensure that the network is well documented and that the documentation is accurate and well maintained. Attempting to troubleshoot a network using incorrect documentation is oftentimes worse than attempting to troubleshoot a network without any documentation at all. As was also stated in Chapter 1, good documentation should include the following information about the network: • Information about the interconnects between devices for LAN and WAN connections • IP addressing and VLAN information • A physical topology diagram of the network • A logical topology diagram of the network • An inventory of all internetwork devices, components, and modules • A revision control section detailing changes to the topology • Configuration information • Any original or additional design documentation and notes • Data or traffic flow patterns 69 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The maintenance of network documentation and configuration can be facilitated by using some of the tools described in Chapter 1. Examples of tools that can be used to facilitate the documentation maintenance function include the Embedded Event Manager (EEM), the Configuration Archive and Rollback feature, and the Configuration Scheduler (KRON). Change Management Change management, or change control, is an integral component of the network maintenance process. The objective of the change management process is to minimize network and service downtime by ensuring that requests for changes are recorded and then evaluated, authorized, prioritized, planned, tested, implemented, documented, and reviewed in a controlled and consistent manner. A change can involve any configuration item or element of infrastructure. Some examples of changes include the following: • Environmental changes • Network changes • Application changes • Hardware changes • Documentation changes • Software changes Ensuring that changes to network devices are performed in a controlled, well-documented manner reduces the likelihood of unplanned network outages. Troubleshooting is also simplified when changes are performed in a controlled environment. The change control process also ensures that network documentation is updated following the change. TROUBLESHOOTING METHODOLOGIES In the previous sections, we discussed the basic troubleshooting flow, detailing the steps in the problem diagnosis phase, which is the most time-consuming phase of the troubleshooting process. In addition, we also discussed the importance of effective communication and the ways in which it facilitates the troubleshooting process. Finally, we examined how a structured maintenance approach also facilitates the troubleshooting process. In this section, we will learn about some common troubleshooting approaches that can be applied when troubleshooting problem reports. Before we delve into these structured approaches, however, it is important to understand the implications of not using a structured troubleshooting approach when troubleshooting a network problem. 70 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S One of the most common mistakes made by engineers is not following some kind of structured troubleshooting approach when attempting to identify the root cause of a problem. While it is true that ad-hoc troubleshooting may eventually produce the desired objective, such an approach is unpredictable and is often very inefficient. One of the most commonly employed troubleshooting techniques, especially by experienced troubleshooters, is the shoot-from-the-hip troubleshooting approach. With this method, after the troubleshooter has collected information, he or she leverages his or her intimate knowledge of the network or calls on experience (past or present), and then immediately implements a change in the hope that the change he or she implemented will resolve the issue. The primary problem with this approach is that while it may work for seasoned engineers who can call on their experience or knowledge of the network, for example, it does not work for inexperienced engineers. A structured, systematic approach, on the other hand, will reduce the amount of time the troubleshooter spends on the problem. In addition, a structured approach increases the efficiency of the overall troubleshooting process itself. The shoot-from-the-hip troubleshooting method is illustrated in Figure 2-3 below: Problem Report Collect Information Resolve Problem Hypothesize Causes Verify Hypothesis Fig. 2-3. The Shoot-from-the-Hip Troubleshooting Method As was previously stated, there is no one single way to troubleshoot. Different problems call for different approaches. However, regardless of the approach used, it is important to adhere to a structured troubleshooting approach. Common structured troubleshooting methods include the following: • The top-down troubleshooting method • The bottom-up troubleshooting method • The follow-the-traffic-path troubleshooting method • The compare configurations troubleshooting method • The divide and conquer troubleshooting method • The component-swapping troubleshooting method These troubleshooting methods will be described in the following sections. 71 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The Top-Down Troubleshoo ng Method When using the top-down troubleshooting approach, the troubleshooter begins troubleshooting at the Application layer of the OSI Model and works his or her way down to the Physical layer. This approach works best when you believe that the problem resides within an application and not within the network or internetwork devices. For example, if a user reports that he or she cannot access a particular server but is able to ping the server IP address, then it can be assumed that Layers 3 through 1 are working fine because there is IP connectivity between the user’s machine and the server. The troubleshooting process would therefore begin at the Application layer. The Bo om-Up Troubleshoo ng Method When using the bottom-up troubleshooting approach, the troubleshooter starts troubleshooting at the Physical layer of the OSI Model and works his or her way up to the Application layer. This approach is based on the assumption that the problem resides at the lower half of the OSI Model. The bottom-up troubleshooting approach is efficient and is one of the most commonly used troubleshooting methods. However, while it works well in smaller networks, it is typically inefficient in larger networks, as it becomes more difficult to discover which network device is actually causing the problem. The Follow-the-Traffic-Path Troubleshoo ng Method The follow-the-traffic-path troubleshooting method requires intimate knowledge of the network, as well as the traffic flows, which, if following best practices, should be included in network documentation. This troubleshooting approach is based on the path that the traffic or packets will take through the network. A common practice when collecting information is to request a traceroute from the user reporting the problem. The troubleshooter can then use this troubleshooting method to eliminate internetwork devices based on the path the traffic takes. The Compare Configura ons Troubleshoo ng Method The compare configurations, or spot-the-difference, method entails comparing the configuration on the current device with an older or archived version of the configuration that had been confirmed to be working. Another approach that is also commonly used is to compare device configurations with that of another similarly configured device that is working. The Divide and Conquer Troubleshoo ng Method The divide and conquer troubleshooting method begins at the Network layer of the OSI Model and then goes either up or down the stack, depending on the results of the test. For example, assume that a user reports that he or she is unable to access a particular server. Using this approach, if a ping to the server IP address was successful, the troubleshooter would begin the troubleshooting process at the top of the OSI stack. On the other hand, if the ping failed, then the troubleshooter would begin the troubleshooting process at the bottom of the OSI stack. 72 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S The divide and conquer troubleshooting method also works well when several troubleshooters are working on the same problem. Once all possible causes of the problem have been hypothesized, individual troubleshooters can be asked to test and verify individual hypotheses. The advantage of using this approach when multiple troubleshooters are all working on the same problem is that it increases efficiency and reduces the likelihood that two or more people are doing the same thing (i.e., a duplication of effort) while other aspects are being neglected. The Component-Swapping Troubleshoo ng Method The component-swapping troubleshooting method entails the replacement of components and observing whether the problem moves with the components. For example, referencing the user intermittent network connectivity example used at the beginning of this chapter, if after replacing the network cable the user is still experiencing issues, the next step would be to move the user to another switch port. If that does not resolve the issue, the workstation NIC card could be replaced next, and so forth. If the problem disappears after a component is replaced, for example the network cable, then it can be concluded that the component is faulty. THE CISCO IOS GENERIC TROUBLESHOOTING TOOLKIT As we learned in the previous section, collecting information is a core component of the overall troubleshooting process. The information that is collected comes from both users and internetwork devices. As a network engineer, it is important to have an intimate understanding of the Cisco IOS commands that can be used to collect information that will assist in the troubleshooting process. This section describes generic commands and utilities used in the troubleshooting process. It does not delve into technology or protocol-specific troubleshooting commands, as those will be described in detail later in this guide. The following topics are included in this section: • Troubleshooting network connectivity • Troubleshooting hardware • Troubleshooting Cisco IOS Services Diagnostics • Filtering command output • Redirecting output • Monitoring and capturing packets • Health monitoring Troubleshoo ng Network Connec vity Two of the most commonly used utilities when troubleshooting connectivity issues are the ping and traceroute utilities. Cisco IOS supports both standard and extended ping and traceroute options. The primary difference between the standard and extended versions is additional capabilities 73 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I that are included in the extended options, which may include specifying Quality of Service parameters when executing the command, for example. In Cisco IOS software, a standard ping is initiated using the ping [hostname | address] privileged EXEC command. Additional keywords can also be used in conjunction with this command. However, the available options will differ depending on the version of Cisco IOS software that is running on the platform. The following output displays additional options available with the standard ping command in Cisco IOS software version 12.4 Mainline: R2#ping 10.0.0.1 ? data specify data pattern df-bit enable do not fragment bit in IP header repeat specify repeat count size specify datagram size source specify source address or name timeout specify timeout interval validate validate reply data <cr> NOTE: These same options are also available and can be used when issuing an extended ping. The data keyword is used to specify the data pattern to use when executing the ping. Different data patterns are used to troubleshoot framing errors and clocking problems on serial lines. The default is 0xABCD (Hexadecimal). The current TSHOOT certification exam does not require you to go into specifics on the different available supported data patterns. The df-bit keyword is used to enable the Don’t Fragment (DF) bit in outgoing ping packets. If the DF bit is set, then packets will not be fragmented when such packets must traverse segments with smaller maximum transmission unit (MTU) values. In such cases, the command will print an error message that is received from the device that wanted to fragment the packet. This error message is denoted by the value ‘M’ in Cisco IOS software. Setting the DF bit in outgoing ping packets is used to determine the smallest MTU in the path to a destination. This can be used to troubleshoot connectivity issues between Transmission Control Protocol (TCP)-based applications across Virtual Private Networks (VPNs), for example. By default, the DF bit is not set when using the Cisco IOS ping command. The repeat keyword is used to specify the number of ping packets that the software will send to the destination address; the default is 5. The size keyword is used to specify the packet size, in bytes, of the ping packets. By default, the Cisco IOS ping command will send 100-byte packets. The source keyword allows you to specify a source interface for the ping packets. By default, the software will use the interface that is used to reach the specified destination. 74 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S The timeout keyword is used to specify the ping packet timeout value. A ping packet that is not received within the interval specified using this keyword is considered unsuccessful. By default, Cisco IOS software specifies a timeout value of two seconds. Finally, the validate keyword is used to specify whether to validate the reply data. The default is no, meaning that the router that originated the Internet Control Message Protocol (ICMP) echo packet will not check the ICMP-Cyclic Redundancy Check (CRC) in the echo reply or response packet. The following example illustrates the use of the df-bit and repeat keywords when issuing a standard ping. The DF bit is set to prevent the packet from being fragmented. The repeat keyword is used to specify that 50 ping packets should be sent to the destination of 192.168.1.1: R2#ping 192.168.1.1 df-bit repeat 50 Type escape sequence to abort. Sending 50, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds: Packet sent with the DF bit set !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Success rate is 100 percent (50/50), round-trip min/avg/max = 4/4/5 ms The following example illustrates the use of the source and timeout keywords when sending a standard ping. The source keyword is used to specify a source IP address, although an interface can also be specified, and the timeout keyword is used to specify a timeout value of one second, which will constitute a successful ping response: R2#ping 1.1.1.1 source 2.2.2.2 timeout 1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 1 seconds: Packet sent with a source address of 2.2.2.2 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/13/56 ms While the exclamation mark (!) or the period (.) are the most commonly seen output characters from the ping utility, as a troubleshooter, you should be familiar with some additional characters that the ping utility may print. Table 2-1 below lists and describes the characters that may be printed by the ping utility: 75 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Table 2-1. Cisco IOS Ping Utility Characters Character Description ! . U Q M ? & Each exclamation point indicates receipt of a reply Each period indicates the network server timed out while waiting for a reply A destination unreachable error PDU was received Source quench (destination too busy) Could not fragment Unknown packet type Packet lifetime exceeded The following example illustrates the output of the ping utility indicating that the destination address specified is unreachable: R2#ping 1.1.1.1 repeat 10 size 500 Type escape sequence to abort. Sending 10, 500-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds: U.U.U.U.U. Success rate is 0 percent (0/10) You typically would see this error message when ICMP packets are being blocked by an ACL or similar filter. The following example displays the output of the ping utility indicating that the ping was unsuccessful because fragmentation is required; however, the ping packets could not be fragmented because the Don’t Fragment (DF) bit was set: R1#ping 192.168.2.1 size 1500 df-bit Sending 5, 1400-byte ICMP Echos to 192.168.2.1, timeout is 2 seconds: Packet sent with the DF bit set M.M.M Success rate is 0 percent (0/5) This error message is commonly seen when troubleshooting MTU issues across VPN tunnels, which have lower MTU values (due to the encapsulation overhead) than physical interfaces have. Solutions to this error message include the following: • Manually changing the MTU values on internetwork devices • Changing the TCP Maximum Segment Size (MSS) • Using or enabling the Path MTU Discovery (PMTUD) feature These concepts will be described in additional detail when we discuss specific troubleshooting scenarios pertaining to VPNs later in this guide. The Cisco IOS extended ping utility includes the same options available with the standard ping utility, plus additional options. These additional options 76 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S include the ability to perform a ping sweep, allowing you to vary the size of the echo packets that are sent. A ping sweep can be used to determine the minimum MTU size between source and destination. This concept will be illustrated in detail later in this guide. An additional capability that is provided by extended pings is the ability to include IP header options in the ping packets. These options include the Record, Loose, Strict, and Timestamp options. The Record option prints the addresses of up to nine hops traversed by the packet. The Loose option allows you to influence the path by specifying the address(es) of the hop(s) you want the packet to go through. The Strict option is used to specify the hop(s) that you want the packet to go through, but no other hop(s) are allowed to be visited. Finally, the Timestamp option is used to measure round-trip time to particular hosts. The following example illustrates the use of the extended ping utility, highlighting some of the options available with this command: R2#ping ip Target IP address: 200.1.1.1 Repeat count [5]: Datagram size [100]: Timeout in seconds [2]: 1 Extended commands [n]: y Source address or interface: FastEthernet0/0 Type of service [0]: Set DF bit in IP header? [no]: y Validate reply data? [no]: y Data pattern [0xABCD]: Loose, Strict, Record, Timestamp, Verbose[none]: r Number of hops [ 9 ]: 4 Loose, Strict, Record, Timestamp, Verbose[RV]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 200.1.1.1, timeout is 1 seconds: Packet sent with a source address of 150.1.1.2 Packet sent with the DF bit set Reply data will be validated Packet has IP options: Total option bytes= 19, padded length=20 Record route: <*> (0.0.0.0) (0.0.0.0) (0.0.0.0) (0.0.0.0) Reply to request 0 (1 ms). Received packet has options Total option bytes= 20, padded length=20 Record route: (200.1.1.2) (200.1.1.1) 77 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I (10.0.0.1) (150.1.1.2) <*> End of list Reply to request 1 (1 ms). Received packet has options Total option bytes= 20, padded length=20 Record route: (200.1.1.2) (200.1.1.1) (10.0.0.1) (150.1.1.2) <*> End of list Reply to request 2 (4 ms). Received packet has options Total option bytes= 20, padded length=20 Record route: (200.1.1.2) (200.1.1.1) (10.0.0.1) (150.1.1.2) <*> End of list Reply to request 3 (4 ms). Received packet has options Total option bytes= 20, padded length=20 Record route: (200.1.1.2) (200.1.1.1) (10.0.0.1) (150.1.1.2) <*> End of list Reply to request 4 (4 ms). Received packet has options Total option bytes= 20, padded length=20 Record route: (200.1.1.2) (200.1.1.1) (10.0.0.1) (150.1.1.2) <*> End of list Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms NOTE: By default, when any IP options are specified, the verbose keyword is automatically included by Cisco IOS software and does not need to be specified manually. 78 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S While the ping utility is a useful tool for verifying connectivity between devices, the traceroute utility is a powerful tool for discovering the path packets take to a remote destination. While it is true that the ping command can also print the path the path packets take to a remote destination, up to a maximum of nine hops, the traceroute utility can provide information on where the routing (i.e., the path between the source and destination network) is broken. Traceroute records the source of each ICMP Time Exceeded message, which allows this tool to provide a trace of the path packets take to reach the specified destination. When using traceroute, the device on which this command is initiated will send out a sequence of User Datagram Protocol (UDP) packets, each with incrementing time to live (TTL) values, to UDP port 33434 at the remote host. This default port number can be changed in some applications. When the traceroute command is initiated, three packets with an IP TTL value of 1 are sent. This IP TTL value causes the packets to time out as soon as they are received by the first gateway device, which responds with an ICMP Time Exceeded message. Next, three more packets are sent out. This time, however, the packets have an IP TTL value of 2, which causes the second gateway in the path to return an ICMP Time Exceeded message. This process is repeated until the packets reach the destination address and the initiating device has received an ICMP Time Exceeded message from all gateways in the path to the specified destination. Because the packets are sent to UDP port 33434, which is an invalid port number, the destination host responds with an ICMP Port Unreachable message when it receives these packets. Receipt of the Port Unreachable message tells the traceroute command to end. NOTE: If the no ip unreachables command has been configured on the interfaces of any internetwork devices between the source and the destination, then the traceroute command will not work. As is the case with the ping utility, Cisco IOS supports both standard and extended traceroute commands. Following are the options available with the standard traceroute command: R2#traceroute 10.1.1.1 ? numeric display numeric address port specify port number probe specify number of probes per hop source specify source address or name timeout specify timeout interval ttl specify minimum and maximum ttl <cr> 79 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The numeric keyword is used to suppress the symbolic display (i.e., hostnames) in the output of the traceroute command. By default, Cisco IOS software defaults to a symbolic and numeric display. Including a symbolic display often results in the traceroute option running very slowly due to the fact that the software attempts to resolve the IP addresses to hostnames. The port keyword is used to specify the destination port that will be used by the traceroute UDP probe messages. As was stated in the previous section, this defaults to UDP port 33434. The probe keyword is used to specify the number of probes to be sent at each TTL level. By default, the software will send three probes. The source keyword is used to specify a source interface or IP address. By default, the router will use the interface that is learned via the route to the destination network. The timeout keyword is used to specify the number of seconds to wait for a response to a probe packet; the default is three seconds. Finally, the ttl keyword is used to specify the IP TTL value of the probes. The following example illustrates a typical traceroute command output: R2#traceroute 192.168.1.1 source FastEthernet0/0 Type escape sequence to abort. Tracing the route to 192.168.1.1 1 R2-Se0-Interface (10.0.2.2) 4 msec 4 msec 4 msec 2 R1-Fa0-Interface (10.0.0.1) 4 msec 4 msec 4 msec 3 R5-Se0-Interface (10.0.9.5) 0 msec * 4 msec The following output shows the same output with the symbolic display suppressed: R2#traceroute 192.168.1.1 numeric Type escape sequence to abort. Tracing the route to 192.168.1.1 1 10.0.2.2 4 msec 4 msec 4 msec 2 10.0.0.1 4 msec 4 msec 4 msec 3 10.0.9.5 0 msec * 4 msec The primary difference between standard and extended traceroute execution is that extended traceroute allows users to specify IP options (i.e., the Loose, Strict, and Record options) when performing the traceroute. All other options that can be used with the standard traceroute option apply and can be used when performing an extended traceroute option. Similar to the ping utility, the traceroute utility also prints different characters, each indicating the result of the operation. Table 2-2 below lists and describes the result characters that may be printed by the traceroute utility: 80 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S Table 2-2. Cisco IOS Traceroute Utility Characters Character nn msec * A Q I U H N P T ? Description The round-trip time in milliseconds for the specified number of probes The probe timed out Administratively prohibited (e.g., access list) Source quench (destination too busy) User interrupted test Port unreachable Host unreachable Network unreachable Protocol unreachable Timeout Unknown packet type The following example illustrates the output of a traceroute command that is administratively prohibited (i.e., by an ACL or similar filter that has been applied to the gateway interface): R2#traceroute 192.168.1.1 source FastEthernet0/0 Type escape sequence to abort. Tracing the route to 192.168.1.1 1 R1-Tu0-Interface (200.1.1.1) !A * 0 msec Troubleshoo ng Hardware The Cisco IOS diagnostic toolkit includes Cisco Generic Online Diagnostics (GOLD) and Cisco IOS Service Diagnostics. Running diagnostics is an inherent part of maintenance, as well as troubleshooting. For example, diagnostics may be run on newly purchased hardware to ensure and validate functionality before it is integrated into the existing network. Additionally, diagnostics could also be run on hardware while troubleshooting a problem report, in an attempt to isolate issues. Cisco GOLD defines a common framework for diagnostics operations across different platforms running Cisco IOS software. The GOLD framework specifies the platform-independent fault-detection architecture for centralized and distributed systems. This includes the common diagnostics CLI and the platform-independent fault-detection procedures for boot-up and runtime diagnostics. When boot-up diagnostics detects a failure on a Cisco Catalyst 6500 Series switch, for example, the failing modules are shut down. Should diagnostics fail, you can open up a trouble report or ticket with the Technical Assistance Center (TAC) to perform additional troubleshooting for the specified module(s) or initiate the 81 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I process of replacing the faulty hardware. By default, Catalyst 6500 Series switches run minimal diagnostics at boot-up; however, this default diagnostics behavior can be changed using the diagnostic bootup level [minimal | complete] global configuration command to allow the switch to run complete (full) diagnostics at boot-up instead. Once the switch has completed running boot-up diagnostics, you can use the show module command to verify the status of the diagnostic testing as illustrated in the following output: Catalyst-6500-Core-Switch#show module Mod Ports Card Type Model Serial No. --- ----- ------------------------------- ------------------ -------------5 2 Supervisor Engine 720 (Active) WS-SUP720-BASE SAD18140BKZ Mod MAC addresses Hw Fw Sw Status --- ---------------------------------- ------ -------- ------------ ------5 000d.bda3.eaa8 to 000d.bda3.eaab 3.1 8.5(1) 12.2(18)SXF1 Ok Mod ---5 5 Sub-Module --------------------------Policy Feature Card 3 MSFC3 Daughterboard Model Serial Hw Status -------------- ----------- ------- ------WS-F6K-PFC3A SAD08150810 2.2 Ok WS-SUP720 SAD0815048A 2.2 Ok Mod Online Diag Status ---- ------------------5 Pass The show diagnostic result command can be used to view detailed testing information about the boot-up diagnostics for all modules as illustrated in the following output: Catalyst-6500-Core-Switch#show diagnostic result Current bootup diagnostic level: complete Module 5: Supervisor Engine 720 (Active) SerialNo : SAD18140BKZ Overall Diagnostic Result for Module 5 : PASS Diagnostic level at card bootup: complete Test results: (. = Pass, F = Fail, U = Untested) 1) TestScratchRegister -------------> . 2) TestSPRPInbandPing --------------> . 3) TestTransceiverIntegrity: Port 1 2 ---------U U 4) TestActiveToStandbyLoopback: 82 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S Port 1 2 ---------U U 5) TestLoopback: Port 1 2 ---------. . 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) TestNewIndexLearn ---------------> TestDontConditionalLearn --------> TestBadBpduTrap -----------------> TestMatchCapture ----------------> TestProtocolMatchChannel --------> TestFibDevices ------------------> TestIPv4FibShortcut -------------> TestL3Capture2 ------------------> TestIPv6FibShortcut -------------> TestMPLSFibShortcut -------------> TestNATFibShortcut --------------> TestAclPermit -------------------> TestAclDeny ---------------------> TestQoSTcam ---------------------> TestL3VlanMet -------------------> TestIngressSpan -----------------> TestEgressSpan ------------------> TestNetflowInlineRewrite: . . . . . . . . . . . . . . . . . Port 1 2 ---------. . 24) 25) 26) 27) 28) 29) 30) 31) 32) TestFabricSnakeForward ----------> TestFabricSnakeBackward ---------> TestTrafficStress ---------------> TestFibTcamSSRAM ----------------> TestAsicMemory ------------------> TestNetflowTcam -----------------> ScheduleSwitchover --------------> TestFirmwareDiagStatus ----------> TestFabricFlowControlStatus -----> . . U U U U U . U NOTE: You are not required to configure GOLD for the current TSHOOT certification exam. In addition to boot-up diagnostics, you can also run on-demand diagnostics or even schedule diagnostic testing to be run at a specific time. These configurations, however, are beyond the scope of the current TSHOOT certification exam and are not described any further in this guide. 83 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Troubleshoo ng Cisco IOS Service Diagnos cs In addition to GOLD, Cisco IOS Service Diagnostics allows for diagnostic testing and tools for common problems in Border Gateway Protocol (BGP), Open Shortest Path First (OSPF) Protocol, and Quality of Service (QoS), as well as tools to monitor and detect abnormal system resource utilization, using scenario-specific troubleshooting scripts. Unlike GOLD, Cisco IOS Service Diagnostics is a programmable diagnostics service that also leverages other Cisco IOS software capabilities, specifically the Embedded Event Manager (EEM), Embedded Syslog Manager (ESM), and Tool Command Language (Tcl). As is the case with GOLD, the configuration of Cisco IOS Service Diagnostics is beyond the scope of the current TSHOOT certification exam and will not be described in further detail in this guide. Filtering Command Output While most engineers are familiar with the show command suite that is available in Cisco IOS software, not all of them are aware of the filtering capabilities integrated into the software. Filtering or restricting command output to only the information you will need to troubleshoot a particular problem not only demonstrates your level of knowledge about the capabilities of Cisco IOS software but also saves time, which can result in a speedier resolution of the problem. In Cisco IOS software, output filtering is performed by appending the pipe (|) symbol, which then allows the output to be acted on according to the specified logic, which includes the begin, exclude, include, and section keywords. Following these keywords, the desired output can be matched against regular expressions. Table 2-3 below lists and describes some of the commonly used regular expressions when filtering command output: Table 2-3. Cisco IOS Regular Expressions Character ^ (caret) $ (dollar sign) . (period) | (pipe) _ (underscore) + (plus) * (asterisk) ? (question mark) Description Indicates the beginning of a string Indicates the end of a string Indicates any character Specifies an either-or operation Matches a comma, braces, parentheses, the beginning of the string, the end of the string, or a space Matches 1 or more sequences of the pattern Matches 0 or more sequences of the pattern Matches 0 or 1 occurrences of the pattern 84 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S The begin keyword filters the output beginning at the specified phrase. The following example illustrates how to filter the output of the current (running) configuration so that the output begins at the Border Gateway Protocol (BGP) configuration that is implemented on the router: R2#show running-config | begin router bgp router bgp 1 no synchronization bgp router-id 1.1.1.1 bgp log-neighbor-changes network 150.1.1.0 mask 255.255.255.0 network 150.2.2.0 mask 255.255.255.0 network 150.3.3.0 mask 255.255.255.0 neighbor 10.0.0.1 remote-as 2 default-information originate no auto-summary ! ip forward-protocol nd ip route 1.1.1.1 255.255.255.255 Serial0/0 ip route 192.168.1.0 255.255.255.0 Tunnel0 ! ! ip http server no ip http secure-server ! logging 192.168.1.254 logging 150.1.1.254 snmp-server community public RW snmp-server community readonlypassword RO ... [Truncated Output] The exclude keyword is used to exclude the specified phrases or regular expressions from the command output. The following example illustrates how the exclude keyword can be used to filter the output of the show ip interface brief command so that only interfaces that are not administratively shut down are displayed in the output of the command: R2#show ip interface brief | exclude administratively down Interface IP-Address OK? Method Status FastEthernet0/0 150.1.1.2 YES NVRAM up Serial0/0 10.0.0.2 YES NVRAM up Loopback0 2.2.2.2 YES NVRAM up Tunnel0 200.1.1.2 YES manual up Protocol up up up up The include keyword includes the specified phrase or regular expression in the output of the command. If the specified phrase is part of a line, then the entire line is displayed. The following example illustrates how to show information on errors for a WAN interface. Notice in the output that because the specified keyword is part of a line, the entire line is printed: 85 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R2#show interfaces Serial0/0 | include error 1 input errors, 0 CRC, 1 frame, 0 overrun, 0 ignored, 0 abort 0 output errors, 0 collisions, 4 interface resets The section keyword prints only information that matches the specified section and no more. For example, if the show running-configuration | begin router bgp command was specified, then the router would parse the configuration and begin at the statement router bgp. However, the router would also display all other configuration following this. Therefore, not only is the BGP configuration displayed but also all other configuration after the BGP configuration is included. Using the section keyword prevents this behavior as illustrated in the following example: R2#show running-config | section router bgp router bgp 1 no synchronization bgp router-id 1.1.1.1 bgp log-neighbor-changes network 150.1.1.0 mask 255.255.255.0 network 150.2.2.0 mask 255.255.255.0 network 150.3.3.0 mask 255.255.255.0 neighbor 10.0.0.1 remote-as 2 default-information originate no auto-summary R2# As seen in the output above, only the relevant matched section of the configuration is printed. As stated earlier, regular expressions can also be used when filtering command output. The following example illustrates how to filter the running configuration on a router so that only lines that end with the number zero are displayed: R2#show running-config | include 0$ ip sla monitor responder type tcpConnect ipaddress 10.0.0.2 port 80 ip sla monitor responder type tcpConnect ipaddress 2.2.2.2 port 80 frequency 30 time-period 10080 ip ftp password 7 1311041A040310 interface Loopback0 interface Tunnel0 ip mtu 1500 tunnel source Serial0/0 interface FastEthernet0/0 ip address 150.1.1.2 255.255.255.0 interface Serial0/0 clock rate 2000000 network 150.1.1.0 mask 255.255.255.0 network 150.2.2.0 mask 255.255.255.0 network 150.3.3.0 mask 255.255.255.0 ip route 1.1.1.1 255.255.255.255 Serial0/0 ip route 192.168.1.0 255.255.255.0 Tunnel0 voice-port 1/0/0 86 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S voice-port 1/1/0 port 1/0/0 line con 0 line aux 0 As a final example, the following illustrates how to filter the router output so that only lines that include 0/0 or 0/1 are printed: R2#show running-config | include 0/0|0/1 tunnel source Serial0/0 interface FastEthernet0/0 interface Serial0/0 interface Serial0/1 ip route 1.1.1.1 255.255.255.255 Serial0/0 voice-port 1/0/0 voice-port 1/0/1 port 1/0/0 port 1/0/1 NOTE: If you include a space between the second pipe and the 0/0 and 0/1 keywords, nothing will be matched by the command. Redirec ng Output In addition to filtering command output, Cisco IOS software allows output to be redirected to an external location or Flash memory. This is useful when executing commands that print a great deal of information, such as the show tech-support command, for example. Output redirection can be performed using the append, redirect, or tee keywords after the pipe. The append keyword allows you to redirect and add the output of any show command to an existing file. You can use the show <command> | append ? command to view valid locations for the platform on which you are working. The following example illustrates the use of this command to view valid locations for a Cisco 2600XM Series router: R2# show ip interface FastEthernet0/0 | append ? ftp: Uniform Resource Locator nvram: Uniform Resource Locator In the output above, valid options for the router include appending to a file stored in NVRAM or on an FTP server. The following example illustrates how to append the output of the running configuration file to an existing file named ‘r2-config’ that is located on FTP server 150.1.1.254: R2#show running-config | append ftp://150.1.1.254/r2-config Writing r2-config R2# 87 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Keep in mind that the example above assumes that both the router and the FTP server have been configured correctly in the event that authentication is required. If credentials are required, include these in the command as illustrated in the example below, which specifies an FTP username of ‘NETADMIN’ with an FTP password of ‘TSHOOT’ when connecting to the FTP server: R2#show running-config | append ftp://netadmin:tshoot@150.1.1.254/r2-config Writing r2-config R2# As previously stated, the redirect keyword is used to redirect the output of a command to a local or remote file location. Again, the options available will depend on the platform on which the command is executed. The following example illustrates the options that are available on a Cisco 2600XM Series router running Cisco IOS version 12.4 Mainline: R2#show tech-support ospf detail | redirect ? flash: Uniform Resource Locator ftp: Uniform Resource Locator http: Uniform Resource Locator https: Uniform Resource Locator nvram: Uniform Resource Locator pram: Uniform Resource Locator rcp: Uniform Resource Locator scp: Uniform Resource Locator tftp: Uniform Resource Locator The following example illustrates how to redirect the output of the show tech-support ospf detail command to a TFTP server with the IP address 150.1.1.254. The file will be saved on the TFTP server with the name ‘r2-ospf-tshoot’: R2#show tech-support ospf detail | redirect tftp://150.1.1.254/r2-ospf-tshoot ! R2# The following example illustrates how to save the same file to the local router Flash memory: R2#show tech-support ospf detail | redirect flash:r2-ospf-tshoot R2#show flash: System flash directory: File Length Name/status 1 29965496 c2600-adventerprisek9-mz.124-25c.bin 2 1691 r1-confg 3 3142 r2-confg 4 595 interface-statistics 5 38015 r2-ospf-tshoot 88 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S [30009264 bytes used, 3020876 available, 33030140 total] 32768K bytes of processor board System flash (Read/Write) To view the contents of the file (if saved to Flash) you can use the more command as follows: R2#more flash:r2-ospf-tshoot ------------------ show version -----------------Cisco IOS Software, C2600 Software (C2600-ADVENTERPRISEK9-M), Version 12.4(25c), RELEASE SOFTWARE (fc2) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2010 by Cisco Systems, Inc. Compiled Thu 11-Feb-10 23:02 by prod_rel_team ROM: System Bootstrap, Version 12.2(7r) [cmong 7r], RELEASE SOFTWARE (fc1) R2 uptime is 22 hours, 47 minutes System returned to ROM by power-on System image file is “flash:c2600-adventerprisek9-mz.124-25c.bin” ... [Truncated Output] Finally, the tee keyword copies the output of any show command to a file and displays the same content on the terminal at the same time. The following output displays the available locations for copying the file while using the tee keyword on a Cisco 2600XM Series router: R2#show ip ospf neighbor detail | tee ? /append Copy and append output to URL (URLs supporting append operation only) flash: Uniform Resource Locator ftp: Uniform Resource Locator http: Uniform Resource Locator https: Uniform Resource Locator nvram: Uniform Resource Locator pram: Uniform Resource Locator rcp: Uniform Resource Locator scp: Uniform Resource Locator tftp: Uniform Resource Locator The following example illustrates how to view the contents of the show ip ospf neighbor detail command, while simultaneously copying the same output to a TFTP server with the IP ad- dress 150.1.1.254. The file will be saved on the TFTP server as ‘r2-ospf-output’: R2#show ip ospf neighbor detail | tee tftp://150.1.1.254/r2-ospf-output ! Neighbor 1.1.1.1, interface address 10.0.0.1 89 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In the area 0 via interface Serial0/0 Neighbor priority is 0, State is FULL, 6 state changes DR is 0.0.0.0 BDR is 0.0.0.0 Options is 0x12 in Hello (E-bit L-bit ) Options is 0x52 in DBD (E-bit L-bit O-bit) LLS Options is 0x1 (LR) Dead timer due in 00:00:31 Neighbor is up for 00:10:07 Index 1/1, retransmission queue length 0, number of retransmission 0 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 0, maximum is 0 Last retransmission scan time is 0 msec, maximum is 0 msec The following example illustrates how to save the same output to Flash while viewing it: R2#show ip ospf neighbor detail | tee flash:r2-ospf-output Neighbor 1.1.1.1, interface address 10.0.0.1 In the area 0 via interface Serial0/0 Neighbor priority is 0, State is FULL, 6 state changes DR is 0.0.0.0 BDR is 0.0.0.0 Options is 0x12 in Hello (E-bit L-bit ) Options is 0x52 in DBD (E-bit L-bit O-bit) LLS Options is 0x1 (LR) Dead timer due in 00:00:38 Neighbor is up for 00:19:50 Index 1/1, retransmission queue length 0, number of retransmission 0 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 0, maximum is 0 Last retransmission scan time is 0 msec, maximum is 0 msec The more command can then be used to view the file that has been stored in Flash as follows: R2#more flash:r2-ospf-output Neighbor 1.1.1.1, interface address 10.0.0.1 In the area 0 via interface Serial0/0 Neighbor priority is 0, State is FULL, 6 state changes DR is 0.0.0.0 BDR is 0.0.0.0 Options is 0x12 in Hello (E-bit L-bit ) Options is 0x52 in DBD (E-bit L-bit O-bit) LLS Options is 0x1 (LR) Dead timer due in 00:00:38 Neighbor is up for 00:19:50 Index 1/1, retransmission queue length 0, number of retransmission 0 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 0, maximum is 0 Last retransmission scan time is 0 msec, maximum is 0 msec 90 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S Monitoring and Capturing Packets There may be times when it is necessary to analyze packets as they traverse the wire when you are troubleshooting complex or obscure problems. While there are many products available that can be used to view captured packets on the wire, the TSHOOT certification exam places emphasis only on understanding how to redirect this captured information from Cisco IOS routers and switches to the appropriate application. Cisco IOS software supports different packet capture mechanisms, depending on whether the device is a router or a switch. On Cisco IOS software-switching routers, such as the Cisco 1800, 2800, and 3800 Series routers running IOS 12.4T or 15.x, the Route IP Traffic Export (RITE) tool allows network administrators to configure the router to export IP packets received on multiple, simultaneous WAN or LAN interfaces to a single LAN or VLAN interface, to which a protocol analyzer or monitoring application is connected. The Router IP Traffic Export feature can also allow you to configure the router to capture IP packets in a buffer within the router, and then to dump the packets into a specified memory device. When using RITE, you can configure the router to filter copied packets using either an ACL or sampling. Sampling allows you to export only one in every few packets in which you are interested. This option should be used if you do not need to capture all incoming traffic. This option can also be used when a monitored ingress interface can send traffic faster than the egress interface can transmit it. An example would be when capturing incoming traffic on a GigabitEthernet interface and exporting it out of a FastEthernet interface. When RITE is configured, by default only incoming (inbound) traffic is exported or captured. However, RITE can be configured to capture bidirectional (inbound and outbound) traffic. Router IP Traffic Export is configured using IP traffic export profiles. Multiple profiles can be configured on the same router. The following section lists and describes the sequence of configurations steps required to configure the RITE feature in IOS software-based routers: 1. Configure a traffic export profile using the ip traffic-export profile <name> global configuration command. 2. In RITE configuration mode, specify the interface on which the captured packets will be sent using the interface <name> configuration command. 3. Next, specify the MAC address of the destination host that will be receiving the packet capture using the mac-address <address> RITE configuration command. Remember that the router interface may be connected to a switch and reside in a VLAN with multiple hosts. If 91 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I the MAC address is not specified, then the profile will not recognize a destination host in which to send the exported packets. 4. Finally, apply the IP traffic export profile to an ingress interface using the ip traffic-export apply <name> interface configuration command. 5. Begin the IP traffic capture using the traffic-export interface <name> start privileged EXEC command. You can also stop the traffic capture using the traffic-export interface <name> stop privileged EXEC command. Additional options that can be speci- fied include the traffic-export interface <name> clear privileged EXEC command, which clears the buffer, and the traffic-export interface <name> copy <destination> command, which can be used to copy the traffic capture to a TFTP server, Flash memory, or an onboard USB device. In addition to these required commands, additional optional commands can be specified when configuring RITE as follows: 1. Optionally, you can configure the router to capture bidirectional packets using the bidirectional RITE configuration command. 2. Optionally, you can configure filtering for incoming traffic using the incoming [accesslist <standard | extended | named> | sample one-in-every <packet-number>] RITE configuration command. Inbound filtering is enabled by default after you create the RITE profile. 3. Finally, you can also optionally filter outgoing traffic using the outgoing [access-list <standard | extended | named> | sample one-in-every <packet-number>] RITE configuration command. The following configuration example illustrates how to configure RITE on a router. The router will be configured to send captured traffic to a device with MAC address 1234.abcd.5678 residing off the GigabitEthernet0/1 interface. In addition, the router will also be configured to sample 1 in every 10 packets. The capture will be for inbound and outbound traffic. Finally, the profile will be applied to the GigabitEthernet0/0 interface: R4(config)#ip traffic-export profile TSHOOT R4(conf-rite)#interface GigabitEthernet0/1 R4(conf-rite)#mac-address 1234.abcd.5678 R4(conf-rite)#incoming sample one-in-every 10 92 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S R4(conf-rite)#outgoing sample one-in-every 10 R4(conf-rite)#bidirectional R4(conf-rite)#exit R4(config)#interface GigabitEthernet 0/0 R4(config-if)#ip traffic-export apply TSHOOT R4(config-if)# *Oct 24 21:08:48.734: %RITE-5-ACTIVATE: Activated IP traffic export on interface GigabitEthernet0/0 After enabling RITE on a particular interface, the router automatically generates the message that can be seen above. When the profile is removed, the following is displayed: R4(config)#int Gi0/0 R4(config-if)#no ip traffic-export apply TSHOOT R4(config-if)#end R4# R4# R4# *Oct 24 21:09:38.542: %RITE-5-DEACTIVATE: Deactivated IP traffic export on interface GigabitEthernet0/0 Following the RITE configuration, the show ip traffic export [interface <name>] command can be used to view or validate the RITE configuration parameters: R4#show ip traffic-export interface GigabitEthernet0/0 Router IP Traffic Export Parameters Monitored Interface GigabitEthernet0/0 Export Interface GigabitEthernet0/1 Destination MAC address 1234.abcd.5678 bi-directional traffic export is on Output IP Traffic Export Information Packets/Bytes Exported Packets Dropped 72 Sampling Rate one-in-every 10 packets No Access List configured Input IP Traffic Export Information Packets/Bytes Exported Packets Dropped 98 Sampling Rate one-in-every 10 packets No Access List configured Profile TSHOOT is Active 7/684 10/964 Finally, begin the traffic capture using the traffic -capture interface <name> start command as follows: R4#traffic-capture interface GigabitEthernet0/0 start R4# *Oct 24 21:40:29.662: %RITE-5-CAPTURE_START: Started IP traffic capture for interface GigabitEthernet0/0 93 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I After you have completed the traffic capture, stop the capture using the traffic-capture interface <name> stop command as follows: R4#traffic-capture interface GigabitEthernet0/0 stop R4# R4# *Oct 24 21:45:52.878: %RITE-5-CAPTURE_STOP: Stopped IP traffic capture for interface GigabitEthernet0/0 As was previously stated, the IP traffic export feature also provides the capability to capture IP packets in local router memory, and then dump this data to a file on an external device, such as Flash memory. The configuration of this capability follows the same basic steps that were used when exporting the captured traffic to a device off the router interface, with some subtle differences. The router configuration steps for local capture are listed and described below: 1. Configure a traffic export profile using the ip traffic-export profile <name> mode capture global configuration command. 2. Specify the length of the packet in capture mode using the length <size> RITE configuration command. Valid options are 128, 256, and 512 bytes. 3. Apply the traffic export profile to an interface using the ip traffic-export apply <name> size <size> interface configuration command. The size <size> option specifies the size of the buffer, in bytes. 4. Begin the IP traffic capture using the traffic-export interface <name> start privileged EXEC command. You can also stop the traffic capture using the traffic-export interface <name> stop privileged EXEC command. Additional options that can be speci- fied include the traffic-export interface <name> clear privileged EXEC command, which clears the buffer, and the traffic-export interface <name> copy <destination> command, which can be used to copy the traffic capture to a TFTP server, Flash memory, or an onboard USB device. As is the case with configuring a traffic capture that will be sent to a specified device, there are also additional options when configuring capture mode (local buffer) traffic captures as follows: 1. Optionally, you can configure the router to capture bidirectional packets using the bidirectional RITE configuration command. 94 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S 2. Optionally, you can configure filtering for incoming traffic using the incoming [accesslist <standard | extended | named> | sample one-in-every <packet-number>] RITE configuration command. Inbound filtering is enabled by default after you create the RITE profile. 3. Finally, you can also optionally filter outgoing traffic using the outgoing [access-list <standard | extended | named> | sample one-in-every <packet-number>] RITE configuration command. The following example illustrates how to configure local IP packet capture on the router. The capture is configured to filter packets referencing an extended ACL and will be applied inbound on the GigabitEthernet0/0 interface. The configuration specifies a packet length of 512 bytes and the local buffer is configured with a size of 1024 bytes: R4(config)#ip traffic-export profile TSHOOT mode capture R4(conf-rite)#length 512 R4(conf-rite)#incoming access-list RITE-ACL R4(conf-rite)#exit R4(config)#interface GigabitEthernet0/0 R4(config-if)#ip traffic-export apply TSHOOT size 1024 R4(config-if)#exit R4(config)#ip access-list extended RITE-ACL R4(config-ext-nacl)#permit icmp any any R4(config-ext-nacl)#end Following the configuration, the show ip traffic export [interface] command can be used to verify the local traffic capture configuration parameters as follows: R4#show ip traffic-export GigabitEthernet0/0 Router IP Traffic Export Parameters Monitored Interface: GigabitEthernet0/0 Limit capture length of packet to 512 bytes. bi-directional traffic capture is off Input IP Traffic Capture Information Packets/Bytes Captured 0/0 Packets Dropped 502 Sampling Rate one-in-every 1 packets Access List RITE-ACL [named extended IP] IP Traffic Capture Buffer Information Defined Buffer Size 1024 bytes Capture Buffer Size 1024 bytes Capture Buffer Used 24 bytes Capture Buffer Free 1000 bytes Profile TSHOOT capture state: Active 95 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I After the traffic capture has been configured, the traffic-capture interface <name> start command should be used to begin the traffic capture as follows: R4#traffic-capture interface GigabitEthernet0/0 start R4# *Oct 24 21:40:29.662: %RITE-5-CAPTURE_START: Started IP traffic capture for interface GigabitEthernet0/0 After you have completed the traffic capture, stop the capture using the traffic-capture interface <name> stop command as follows: R4#traffic-capture interface GigabitEthernet0/0 stop R4# R4# *Oct 24 21:45:52.878: %RITE-5-CAPTURE_STOP: Stopped IP traffic capture for interface GigabitEthernet0/0 Finally, the traffic-capture interface <name> copy command can be used to export the traffic capture information to an external location, such as a TFTP server, Flash memory, or even an onboard USB device. The following example illustrates how to export the traffic capture to a TFTP server with the IP address 150.1.1.254. The file will be saved on the TFTP server with the name ‘r4traffic-capture’: R4#traffic-capture interface GigabitEthernet0/0 copy tftp: Address or name of remote host []? 150.1.1.254 Capture buffer filename []? r4-traffic-capture Copying capture buffer to tftp://150.1.1.254/r4-traffic-capture !! Before implementing RITE in a production environment, keep in mind that a delay is incurred on the outbound interface when packets are captured and transmitted across the interface. Performance delays increase with the increased number of interfaces that are monitored and the increased number of destination hosts. Finally, keep the following restrictions in mind when configuring or enabling the Router IP Traffic Export feature: • The MAC address of the device that is receiving the exported traffic must be on the same VLAN or directly connected to one of the router interfaces. You can use the show arp or show ip arp command to determine the MAC address of any device that is directly con- nected to an interface. • The outgoing interface for exported traffic must be Ethernet (10/100/1000). However, the incoming or monitored traffic can traverse any interface. 96 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S On Cisco IOS-distributed router platforms, such as the Cisco 7600 Series routers, as well as on Cisco IOS Catalyst switches, the Switched Port Analyzer (SPAN) feature is used to capture packets instead. There are three variations of SPAN, which include the local SPAN feature, Remote SPAN (RSPAN), and Encapsulated RSPAN (ERSPAN). The local SPAN feature, commonly referred to as SPAN, copies traffic from one or more CPUs, one or more ports, one or more EtherChannels, or one or more VLANs, and sends the copied traffic to one or more destinations for analysis by a network analyzer, such as a Switch Probe device or other Remote Monitoring (RMON) probe. Traffic can also be sent to the processor for packet capture by the Mini Protocol Analyzer. While SPAN does not affect the switching of traffic on sources, it is important to remember that the SPAN-generated copies of traffic compete with user traffic for switch resources. Local SPAN sessions are comprised of an association of source ports and source VLANs with one or more destinations. Each local SPAN session can have either ports or VLANs as sources, but not both. Local SPAN sessions are configured on a single switch. When configuring SPAN, the following restrictions apply when specifying ports as the source: • The port can be any port type, such as EtherChannel, FastEthernet, or GigabitEthernet • The same local port can be monitored in multiple SPAN sessions • The local SPAN source port cannot be configured as a destination port • Each source port can be configured with a direction (ingress, egress, or both) to monitor • Source ports can be in the same or different VLANs When configuring a VLAN as the SPAN source, the following restrictions apply: • On a given port, only traffic on the monitored VLAN is sent to the destination port • All active ports in the source VLAN are included as source ports • Destination ports that belong to the source VLAN are excluded from the source list • Removed or added ports in a VLAN are removed or added to the session • You can monitor only Ethernet VLANs • You cannot use filter VLANs in the same local SPAN session with VLAN sources Finally, the following restrictions apply to the SPAN destination ports: • The destination port must reside on the same physical single switch as the source port • The destination port can be any Ethernet physical port • The destination port can participate in only one SPAN session at a time 97 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • The destination port cannot be a source port • The destination port cannot be an EtherChannel group • If the destination port resides in an EtherChannel group, then it is removed from the group • The destination port will not transmit traffic unless learning is enabled • The destination port line protocol will show a state of up/down by design • If ingress forwarding is enabled, then the destination port forwards traffic at Layer 2 • A destination port does not participate in Spanning Tree while the SPAN session is active • When it is a destination port, it does not participate in any of the Layer 2 protocols • If the port belongs to a source VLAN, then it is excluded from the source and is not monitored In Cisco IOS Catalyst switches, local SPAN source is configured using the monitor session <session_number> source [[single_interface | interface_list | interface_range | mixed_interface_list | single_vlan | vlan_list | vlan_range | mixed_vlan_list] [rx | tx | both]] global configuration command. Keep in mind that the options available will vary depending on the switch platform. The local SPAN destination is configured using the monitor session <session_number> destination [single_interface | interface_list | interface_range | mixed_interface_list] global configuration command. The following configuration example illustrates how to configure local SPAN on the switch to copy inbound and outbound traffic on FastEthernet0/1 and send this traffic to interface FastEthernet0/2. It is assumed a monitoring device is connected to the FastEthernet0/2 interface: Sw1#configure terminal Enter configuration commands, one per line. End with CNTL/Z. Sw1(config)#monitor session 1 source interface Fa0/1 both Sw1(config)#monitor session 1 destination interface Fa0/2 Sw1(config)#end Following this implementation, use the show monitor session [<session | all] detail command to verify the local SPAN configuration: Sw1#show monitor session 1 Session 1 --------Type : Local Session Source Ports : Both : Fa0/1 Destination Ports : Fa0/2 Encapsulation: Native Ingress: Disabled 98 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S The detail keyword can be appended to view detailed information as follows: Sw1#show monitor session 1 detail Session 1 --------Type : Local Session Source Ports : RX Only : None TX Only : None Both : Fa0/1 Source VLANs : RX Only : None TX Only : None Both : None Source RSPAN VLAN : None Destination Ports : Fa0/2 Encapsulation: Native Ingress: Disabled Reflector Port : None Filter VLANs : None Dest RSPAN VLAN : None Figure 2-4 below illustrates a sample packet capture, using Wireshark, based on the configuration that was applied to the switch in the previous configuration example: Fig. 2-4. Sample Packet Capture from a Local SPAN Session 99 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Unlike local SPAN, Remote SPAN (RSPAN) supports source ports and VLANs, as well as destinations on different switches, allowing you to perform remote monitoring of multiple switches across your network. RSPAN does this by using a Layer 2 VLAN to carry SPAN traffic between switches. RSPAN configuration is therefore comprised of an RSPAN source session, an RSPAN destination session, and an RSPAN VLAN. RSPAN source and destination sessions can also be configured on different switches. An RSPAN source session can have either ports or VLANs as sources, but not both. The RSPAN source session copies traffic from the source ports or source VLANs and switches the traffic over the RSPAN VLAN to the RSPAN destination. The RSPAN destination session switches the traffic to the destinations. In addition to source ports and VLANs, as well as destination ports, RSPAN also introduces a new port type referred to as a reflector port. The reflector port is simply a port that copies packets onto an RSPAN VLAN. All reflector ports have the following characteristics and restrictions: • A reflector port is a port set to Loopback • A reflector port cannot be an EtherChannel group • A reflector port does not trunk • A reflector port cannot do protocol filtering • If a port assigned to an EtherChannel is specified, then it is removed from the EtherChannel • A reflector port cannot be a SPAN source or destination port • A reflector port cannot be a reflector port for more than one RSPAN session • A reflector port is invisible to all VLANs • The native VLAN for looped-back traffic on a reflector port is the RSPAN VLAN • The reflector port loops back untagged traffic to the switch • Spanning tree is automatically disabled on a reflector port • A reflector port receives copies of sent and received traffic for all monitored source ports The configuration of RSPAN is performed in two steps. The first step entails the configuration of the RSPAN VLAN using the remote-span VLAN configuration command. The following configuration example illustrates how to configure a VLAN as an RSPAN VLAN: Sw1(config)#vlan 123 Sw1(config-vlan)#name RSPAN-VLAN Sw1(config-vlan)#remote-span Sw1(config-vlan)#exit Sw1(config)#exit Sw1# This configuration can then be validated using the show vlan id command as follows: 100 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S Sw1#show vlan id 123 VLAN Name Status Ports ---- -------------------------------- --------- ---------------------------123 RSPAN-VLAN active VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2 ---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ---123 enet 100123 1500 0 0 Remote SPAN VLAN ---------------Enabled Primary Secondary Type Ports ------- --------- ----------------- ---------------------------------------- The second step following the RSPAN VLAN configuration is to configure the RSPAN sessions. This can be performed in SPAN configuration mode or in global configuration mode. The following section describes how to configure a source RSPAN session in SPAN configuration mode: 1. Configure the RSPAN source session using the monitor session <session> type rspan-source global configuration command. 2. Associate the RSPAN source session number with the CPU, source ports, or VLANs, and select the traffic direction to be monitored using the source [[cpu [rp | sp]] | <interface> | <interface list> | <interface range> | <vlan> | <vlan list> | <vlan range>] [rx | tx | both] RSPAN source session configuration command. 3. Associate the RSPAN source session number with the RSPAN VLAN using the destination remote vlan <RSPAN-VLAN> RSPAN source session configuration command. The following section describes how to configure the destination session in RSPAN destination sessions in SPAN configuration mode: 1. Configure the RSPAN destination session using the monitor session <session> type rspan-destination global configuration command. 2. Associate the RSPAN destination session number with the RSPAN VLAN using the source remote vlan <RSPAN-VLAN> RSPAN destination session configuration command. 3. Associate the RSPAN destination session number with the destinations using the destination <interface>| <interface list | <interface range> [ingress [learning]] RSPAN destination session configuration command. 101 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The following configuration example illustrates how to configure an RSPAN session between two switches. It should be assumed that these switches have a truck connection configured between them. The RSPAN configuration will copy traffic from port GigabitEthernet1/1 on switch 1 to port GigabitEthernet1/1 on switch 2. VLAN 123 will be used for RSPAN. The configuration on switch 1 (Sw1), which is the RSPAN source, is implemented as follows: Sw1(config)#vtp domain H2N-TSHOOT Sw1(config)#vtp mode transparent Sw1(config)#vlan 123 Sw1(config-vlan)#name RSPAN-VLAN Sw1(config-vlan)#remote-span Sw1(config-vlan)#exit Sw1(config)#exit Sw1(config)#monitor session 1 type rspan-source Sw1(config-mon-rspan-src)#source interface GigabitEthernet1/1 Sw1(config-mon-rspan-src)#destination remote vlan 2 Sw1(config-mon-rspan-src)#end The configuration on switch 2 (Sw2), the RSPAN destination, is implemented as follows: Sw2(config)#vtp domain H2N-TSHOOT Sw2(config)#vtp mode transparent Sw2(config)#vlan 123 Sw2(config-vlan)#name RSPAN-VLAN Sw2(config-vlan)#remote-span Sw2(config-vlan)#exit Sw2(config)#exit Sw2(config)#monitor session 1 type rspan-destination Sw2(config-mon-rspan-dst)#source remote vlan 2 Sw2(config-mon-rspan-dst)#destination interface GigabitEthernet1/1 Sw2(config-mon-rspan-dst)#end Following the RSPAN configuration, you can use the show monitor session <number> command to validate the configuration. Following is the output of this command on Sw1: Sw1#show monitor session 1 Session 1 --------Type : Remote Source Session Source Ports : Both : Gi1/1 Dest RSPAN VLAN : 2 The output of the same command on Sw2 displays the following: Sw2#show monitor session 1 Session 1 102 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S --------Type : Remote Destination Session Source RSPAN VLAN : 2 Destination Ports : Gi1/1 Encapsulation : Native Ingress : Disabled As was previously stated, the second option is to configure RSPAN in global configuration mode as follows: 1. Configure the RSPAN source session number with the source ports or VLANs, and select the traffic direction to be monitored using the monitor session <session> source [<interface> | <interface list> |<interface range> | <vlan> | <vlan list>| <vlan range>] [rx | tx | both] global configuration command. 2. Associate the RSPAN source session number with the RSPAN VLAN using the monitor session <session> destination remote vlan <RSPAN-VLAN> global configuration command. The following section describes how to configure RSPAN destination sessions in global configuration mode: 1. Configure the RSPAN destination session number with the RSPAN VLAN using the monitor session <session> source remote vlan <RSPAN-VLAN> global configuration command. 2. Associate the RSPAN destination session number with the RSPAN VLAN using the monitor session <session> destination [<interface> | <interface list> | <interface range> | [ingress [learning]]] global configuration command. The following configuration illustrates how to configure RSPAN in global configuration mode using VLAN 123 as the RSPAN VLAN. This configuration assumes that a trunk has been configured between the two switches. RSPAN is configured to copy traffic received from the GigabitEthernet1/1 port on switch 1, while switch 2 forwards this copied traffic to its local GigabitEthernet1/1 port. The configuration on switch 1 (Sw1), which is the RSPAN source, is implemented as follows: Sw1(config)#vtp domain H2N-TSHOOT Sw1(config)#vtp mode transparent Sw1(config)#vlan 123 Sw1(config-vlan)#name RSPAN-VLAN Sw1(config-vlan)#remote-span Sw1(config-vlan)#exit 103 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Sw1(config)#exit Sw1(config)#monitor session 1 source interface GigabitEthernet1/1 Sw1(config)#monitor session 1 destination remote vlan 2 Sw1(config)#end The configuration on switch 2 (Sw2), the RSPAN destination, is implemented as follows: Sw2(config)#vtp domain H2N-TSHOOT Sw2(config)#vtp mode transparent Sw2(config)#vlan 123 Sw2(config-vlan)#name RSPAN-VLAN Sw2(config-vlan)#remote-span Sw2(config-vlan)#exit Sw2(config)#exit Sw2(config)#monitor session 1 source remote vlan 2 Sw2(config)#monitor session 1 destination interface GigabitEthernet1/1 Sw2(config)#end As stated in the previous configuration example, the show monitor session <number> command can be used to validate this configuration. Following is the output of this command on switch 1 (Sw1): Sw1#show monitor session 1 Session 1 --------Type : Remote Source Session Source Ports : Both : Gi1/1 Dest RSPAN VLAN : 2 NOTE: When using the show monitor session <number> command, append the detail keyword to this command to print detailed information as illustrated below: Sw1# show monitor session 1 detail Session 1 -----------Type : Remote Source Session Source Ports: RX Only: Gi1/1 TX Only: None Both: None Source VLANs: RX Only: None TX Only: None Both: None Source RSPAN VLAN: None Destination Ports: None Filter VLANs: None Dest RSPAN VLAN: 2 104 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S The output of the show monitor session command on Sw2 displays the following: Sw2#show monitor session 1 Session 1 --------Type : Remote Destination Session Source RSPAN VLAN : 2 Destination Ports : Gi1/1 Encapsulation : Native Ingress : Disabled Finally, the last SPAN variant, Encapsulated RSPAN (ERSPAN), is somewhat similar to RSPAN in that it supports source ports and VLANs, and destinations on different switches; however, unlike RSPAN, that uses a Layer 2 VLAN for the SPAN traffic, ERSPAN uses a GRE tunnel to carry traffic between switches. This means that ERSPAN can be configured across IP networks, thus providing far greater monitoring capabilities on the network. ERSPAN consists of an ERSPAN source session, routable ERSPAN GRE-encapsulated traffic, and an ERSPAN destination session. In a manner similar to RSPAN, ERSPAN source and destination sessions can be configured on different switches. Additionally, like RSPAN, each ERSPAN source session can have either ports or VLANs as sources, but not both. NOTE: The configuration of ERSPAN is beyond the scope of the TSHOOT certification exam and will not be illustrated or described in any further detail in this guide. Health Monitoring Verifying the health of internetwork devices is a common troubleshooting task. Cisco IOS software allows you to view memory and processor utilization statistics, as well as environmental variables, such as the temperature of the device and the status of power supplies installed within the device. The show processes command prints information about active processes running on the device (i.e., router or switch). This command can be used to view detailed information about specific processes, detailed CPU utilization statistics, and even detailed memory utilization statistics. The valid keywords that can be used in conjunction with this command are listed in the output below: R2#show processes <1-4294967295> cpu history memory timercheck | <cr> ? Process Number Show CPU use per process display ordered process history Show memory use per process Show processes configured for timercheck Output modifiers 105 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I NOTE: These options will vary depending on the platform on which the command is issued. The most commonly used keywords are cpu and memory. These keywords display detailed CPU utilization statistics as well as the amount of memory used, respectively. Following is a sample output of the show processes cpu command: R2#show processes cpu CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 1 0 3 0 0.00% 0.00% 0.00% 0 Chunk Manager 2 84 24809 3 0.00% 0.00% 0.00% 0 Load Meter 3 84 17 4941 0.00% 0.00% 0.00% 0 Exec 4 0 1 0 0.00% 0.00% 0.00% 0 EDDRI_MAIN 5 280786 18847 14898 0.00% 0.27% 0.23% 0 Check Heaps 6 24 13 1846 0.00% 0.00% 0.00% 0 Pool Manager 7 0 2 0 0.00% 0.00% 0.00% 0 Timers 8 0 1 0 0.00% 0.00% 0.00% 0 Crash Writer ... [Truncated Output] The show processes cpu command prints a great deal of information, as can be seen above. When using this command, consider filtering the output using the applicable keywords to reduce the amount of information that is returned. The following example illustrates how to filter the output of this command to include only BGP processes: R2#show processes cpu | include BGP 120 52 7029 7 151 81 1646 49 218 1651 230 7178 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% 0.00% 0.00% 0.00% 0 BGP Router 0 BGP I/O 0 BGP Scanner The show processes cpu command can also be used to view the historical CPU utilization statistics for the device over a 60-second, a 60-minute, and a 72-hour interval by simply appending the history command as illustrated in the following output: R1#show processes cpu history R1 05:04:38 AM Sunday Mar 3 2002 UTC 111111 444449999999999889999998800000099999999999899999998888877777 888889999997777449777774400000099999779999899999992222222222 100 ********** ****** ***************** ******* 90 ********** ****** ************************* 80 ************************************************** 106 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S 70 ******************************************************* 60 ******************************************************* 50 ************************************************************ 40 ************************************************************ 30 ************************************************************ 20 ************************************************************ 10 ************************************************************ 0....5....1....1....2....2....3....3....4....4....5....5....6 0 5 0 5 0 5 0 5 0 5 0 CPU% per second (last 60 seconds) 1 08859 3 083896111112212111122411111212114121433721132211111122221121 100 * * 90 #* * 80 ##* * 70 ##* * 60 ##*** 50 ##*** 40 ###*# * 30 ###*# * 20 ##### * 10 #####* # 0....5....1....1....2....2....3....3....4....4....5....5....6 0 5 0 5 0 5 0 5 0 5 0 CPU% per minute (last 60 minutes) * = maximum CPU% # = average CPU% 38 70 100 90 80 * 70 * 60 * 50 * 40 ** 30 ** 20 ** 10 ** 0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.. 0 5 0 5 0 5 0 5 0 5 0 5 0 CPU% per hour (last 72 hours) * = maximum CPU% # = average CPU% The show processes memory command displays information about the active processes in the router and the corresponding memory used. Following is a sample output of the show processes memory command: 107 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R2#show processes memory Processor Pool Total: 53977824 Used: I/O Pool Total: 3729408 Used: PID TTY 0 0 0 0 0 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 Allocated 32967392 12052 29197288 0 15104 196 22236 65588 3296 210936 196 0 Freed 13607204 636100 27868336 0 0 196 22076 0 196 274452 196 0 22192752 Free: 2131648 Free: Holding 15720348 12052 2512444 490880 22116 4012 13188 90600 10112 26188 7012 25012 Getbufs 0 0 380484 0 0 0 0 0 0 199980 0 0 31785072 1597760 Retbufs 0 0 380484 0 0 0 0 0 0 233400 0 0 Process *Init* *Sched* *Dead* *MallocLite* Chunk Manager Load Meter Exec EDDRI_MAIN Check Heaps Pool Manager Timers Crash Writer ... [Truncated Output] As is applicable with the show processes cpu command, the show processes memory command prints a great deal of information. Again, keep in mind that the filtering options that are applicable with other commands can also be used with this command to reduce the amount of information that is printed by this command. The following illustrates how to use Cisco IOS filters to include only memory utilization information for different BGP processes: R2#show processes memory | section BGP 120 0 45584 0 10096 151 0 0 0 7036 218 0 0 0 10048 0 0 0 0 BGP Router 0 BGP I/O 0 BGP Scanner In addition to the show processes memory command, you can also use the show memory command to display summary information about processor and I/O memory, followed by a more comprehensive report of memory utilization. The show memory command supports the following keywords or options: R2#show memory ? allocating-process dead debug failures fast fragment free io multibus Show allocating process name Memory owned by dead processes Memory debugging commands Memory failures Fast memory stats Summary of memory fragment information Free memory stats IO memory stats Multibus memory stats 108 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S overflow pci processor statistics summary transient memory overflow corrections PCI memory stats Processor memory stats Mempool Statistics Summary of memory usage per alloc PC Transient memory stats While delving into the specifics on all applicable or valid keywords is beyond the scope of the TSHOOT certification exam, the show memory allocating-process totals or show memory summary commands are two of the most commonly used options when troubleshooting memory allocation issues. The output printed by these commands can be decoded using the Cisco Output Interpreter, which is available online, and is also commonly requested by the Technical Assistance Center (TAC) for troubleshooting allocation issues. NOTE: As is applicable with other Cisco IOS show commands, you can filter the output that is printed by this command or redirect it to an external location for further analysis. Finally, the last health monitoring and troubleshooting command that we will discuss in this section is the show environment command. This command provides information on the status of your router fans, power supply, system board temperature, and more. However, it should be noted that this command is supported in Cisco 2600XM and ISR-Series routers, as well as Catalyst 3750, 4500, and 6500 Series switches. Once per minute, a routine is run that receives environmental measurements from sensors and stores the output into a buffer. This buffer is displayed on the console when the show environment command is entered. In the event that a measurement exceeds desired margins, but has not exceeded fatal margins, a warning message is printed on the system console. Cisco IOS software queries the sensors for measurements once per minute, but warnings for a given test point are printed, at most, once every hour for sensor readings in the warning range, and once every five minutes for sensor readings in the critical range. If a measurement is out of line within these time segments, then an automatic warning message appears on the console. If a shutdown occurs because of detection of fatal environmental margins, the last measured value from each sensor is stored in internal nonvolatile memory. You can also enable SNMP notifications (traps or informs) to alert an NMS when environmental thresholds are reached using the snmp-server enable traps envmon global configuration command in conjunction with the other SNMP configuration commands. Whenever Cisco IOS software detects a failure or recovery event, it sends an SNMP trap to the configured SNMP server. Unlike console messages, only one SNMP trap is sent when the failure event is first detected. Another 109 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I trap is sent when the recovery is detected. The following displays the options that are available with the show environment command on a Catalyst 6500 Series switch. Keep in mind that these options will vary depending on the platform: Catalyst-6500-Core-Switch#show environment ? alarm show environmental alarms connector show connector parameters cooling show cooling parameters status operational status of FRU temperature temperature readings voltage voltage readings | Output modifiers <cr> On a Cisco 2800 Series router, the following options can be used with this command: R4#show environment ? all All environmental monitor parameters last Last environmental monitor parameters table Temperature and voltage ranges | Output modifiers <cr> Following is the output of the show environment command for information on one specific module on a Catalyst 6500 Series switch. The information printed by this command will vary depending on the chassis and modules: Catalyst-6500-Core-Switch#show environment status module 5 module 5: module 5 power-output-fail: OK module 5 outlet temperature: 30C module 5 inlet temperature: 25C module 5 device-1 temperature: 37C module 5 device-2 temperature: 36C module 5 asic-1 (SSO-1) temp: 26C module 5 asic-2 (SSO-2) temp: 26C module 5 asic-3 (SSO-3) temp: 25C module 5 asic-4 (SSO-4) temp: 26C module 5 asic-5 (SSA-1) temp: 26C module 5 asic-6 (HYPERION-1) temp: 26C Following is the output of the show environment command on a Cisco 2800 Series router: R4#show environment Main Power Supply is AC Fan 1 OK Fan 2 OK 110 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S Fan 3 OK Fan Speed Setting: Normal System Temperature: 26 Celsius (normal) Environmental information last updated 00:00:29 ago ADDITIONAL TROUBLESHOOTING TOOLS In addition to the CLI toolkit described in the previous section, Cisco also provides GUI-based tools and applications that can be used for troubleshooting purposes. These include tools such as SDM, CCP, and CiscoWorks LMS. Because these tools and applications are described in the previous chapter, they are not described again in this chapter. Please refer to the previous chapter for additional information on these tools. For registered Cisco Connection Online (CCO) users, Cisco also provides additional online tools that can be used to assist with the troubleshooting process. One such tool, the Error Message Decoder, can assist a troubleshooter research and resolve error messages in Cisco IOS software, as well as other Cisco software variants. For example, assume you consistently saw the following error message on the console or in the logs of a particular router: %MARS_NETCLK-3-HOLDOVER_TRANS. Using the Error Message Decoder, you would simply paste this message into the tool. The tool would then print information describing what the error message means, as well as provide a recommended action to resolve the issue that is causing this message. Another commonly used online troubleshooting tool is the Output Interpreter. This tool can be used to identify problems by analyzing the output of supported show commands. The troubleshooter can simply paste the output of a supported show command into the tool, which then decodes the output and provides recommendations for resolving identified issues. While these two tools are commonly used and are popular, they are only a small subset of what Cisco has online. The complete list of tools can be found at the following link: http://www.cisco.com/en/US/customer/support/tsd_most_requested_tools.html NOTE: A valid CCO account is required to access the tools and resources available online. 111 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Troubleshoo ng and the Troubleshoo ng Flow • Troubleshooting can be thought of as the process of identifying or diagnosing a problem • There is no single troubleshooting method that can be applied to all situations • The high-level three-step troubleshooting flow is comprised of the following phases: 1. The Problem Report 2. Problem Diagnosis 3. Problem Resolution • The problem report is what typically initiates the troubleshooting process • The problem report is used to define the problem • The very first step of the troubleshooting process itself is problem diagnosis • The problem diagnosis phase is the most time consuming phase • The five steps included in the problem diagnosis phase are as follows: 1. Collecting information about the problem 2. Analyzing or examining the collected information 3. Eliminating possible causes 4. Hypothesizing or theorizing potential causes 5. Verifying the hypothesis or theory • The problem resolution phase entails notifying and confirming that the problem is resolved Communica on and Troubleshoo ng • Effective communication is an integral component of the troubleshooting process • Effective communication includes the end user, team management, and management • Effective communication is essential in all steps of the troubleshooting process Integra ng Maintenance and Troubleshoo ng • A well documented and well maintained network is a lot easier to support and troubleshoot • A structured maintenance approach facilitates the troubleshooting and support functions • Baselining is a network maintenance function that facilitates troubleshooting • Baselining is a process for studying the network and network devices regular intervals • Baselining can help you obtain the following information from the network: 1. Information on the health of the hardware and software 2. Predict future problems 3. Determine the current utilization of network resources 112 C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S 4. Identify current network problems 5. Make accurate decisions about network alarm thresholds • The baselining process can be used to determine the network break point • Baselining tools include Cisco IP SLA operations, SNMP, NetFlow, and CiscoWorks • Documentation is integral to the troubleshooting process • The network should be well documented and documentation well maintained • Tools that facilitate documentation include EEM, Configuration Archive, and KRON • Change management also facilitates the troubleshooting process • A change management process can help minimize network and service downtime • Some examples of changes include the following: 1. Environmental changes 2. Network changes 3. Application changes 4. Hardware changes 5. Documentation changes 6. Software changes • Troubleshooting is simplified when changes are implemented in a controlled environment Troubleshoo ng Methodologies • A structured troubleshooting approach reduces the amount of time spent troubleshooting • A structured troubleshooting approach results in greater efficiency • Experienced troubleshooters commonly use a shoot from the hip troubleshooting method • A shoot from the hip approach leverages the troubleshooters experience and knowledge • A shoot from the hip approach will not usually work well for inexperienced troubleshooters • Commonly used structured troubleshooting approaches include the following: 1. The Top-Down Troubleshooting Method 2. The Bottom-Up Troubleshooting Method 3. The Follow the Traffic Path Method 4. The Compare Configurations Method 5. The Divide and Conquer Method 6. The Component Swapping Method • The top-down approach begins troubleshooting at Layer 7 and works down to Layer 1 • The bottom-up approach begins troubleshooting at Layer 1 and works up to Layer 7 • The follow the traffic method requires intimate network and traffic flow knowledge • The follow the traffic method is based on the path that traffic takes through the network 113 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • The compare configurations method compares configurations with working ones • The divide and conquer method begins at the middle of the OSI model • The divide and conquer method then works up or down depending on the test results • The component swapping method physically replaces components The Cisco IOS Generic Troubleshoo ng Toolkit • Cisco IOS software provides a plethora of tools that can be used to support troubleshooting • The ping and traceroute utilities are commonly used to verify network connectivity • The ping utility is primarily used to verify connectivity between endpoints • The traceroute utility is primarily used to discover the path taken between endpoints • Cisco IOS software supports standard and extended ping and traceroute functions • Cisco IOS software supports GOLD and IOS Service Diagnostics utilities • Cisco GOLD can be used to troubleshoot hardware problems • Cisco IOS Service Diagnostics is a programmable diagnostics service • Cisco IOS Service Diagnostics can be used for BGP,OSPF, and QoS diagnostics • Cisco IOS Service Diagnostics can be used to monitor and detect abnormal utilization • Cisco IOS Service Diagnostics leverages EEM and Tcl • Cisco commands are filtered using the begin, exclude, include, and section keywords • Regular expressions can also be used when performing command filtering • Cisco IOS command output can also be redirected to external locations, e.g. Flash and TFTP • Cisco IOS command output redirection uses the append, redirect or tee keywords • The append keyword appends output to an existing file • The redirect keyword redirects the command output to the specified location • The tee keyword redirects output and allows you to see it at the same time • Analyzing traffic packet captures is one of the most common troubleshooting tasks • Cisco IOS software-based routers support RITE for packet captures • RITE can be used to send captures to a specified device or store them in memory • Cisco Catalyst switches and high-end routers support SPAN for packet captures • Cisco provides three variants of SPAN: local SPAN, RSPAN and ERSPAN • Local SPAN is configured on a single physical device • Remote SPAN can be configured between multiple Layer 2 switches across trunk links • ERSPAN can be configured between remote switches separated by IP networks • The show processes command can be used for health monitoring and verification • The show processes command provides CPU and memory statistics • The show environment command is used to verify device environmental variables • Cisco provides additional troubleshooting online, e.g. Error Message Decoder 114 CHAPTER 3 Troubleshoo ng Switches at Layers 1 and 2 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I I n the first two chapters of this guide, we discussed network monitoring and maintenance functions, as well as troubleshooting methodologies and processes from a theoretical standpoint. The following chapters in this guide will discuss troubleshooting from a practical perspective. This chapter focuses on troubleshooting Cisco Catalyst switch Layer 1 and Layer 2 issues. The TSHOOT certification exam objectives that are covered in this chapter include the following: • Troubleshoot switch-to-switch connectivity for the VLAN-based solution • Troubleshoot loop prevention for the VLAN-based solution LAN switching is a form of packet switching that is used in Local Area Networks (LANs). LAN switching is performed in hardware at the Data Link layer. Because LAN switching is hardwarebased, it uses hardware addresses that are referred to as Media Access Control (MAC) addresses. The MAC addresses are then used by LAN switches to forward frames. This chapter will be divided into the following sections: • Troubleshooting at the Physical Layer • VLAN,VTP, and Trunking Overview • Troubleshooting VLANs • Using the ‘show vlan’ Command • Spanning Tree Protocol Overview • Troubleshooting Spanning Tree Protocol • Using the ‘show spanning-tree’ Command TROUBLESHOOTING AT THE PHYSICAL LAYER Cisco IOS switches support several commands that can be used to troubleshoot Layer 1, or at least suspected Layer 1, issues. However, in addition to being familiar with the software command suite, it is also important to have a solid understanding of physical indicators (i.e., LEDs) that can be used to troubleshoot link status or that indicate an error condition. Troubleshoo ng Link Status Using Light Emi ng Diodes (LEDs) If you have physical access to the switch or switches, LEDs can be a useful troubleshooting tool. Different Cisco Catalyst switches provide different LED capabilities. Understanding the meaning of the LEDs is an integral part of Catalyst switch link status and system troubleshooting. Cisco Catalyst switches have front-panel LEDs that can be used to determine link status, as well as other variables such as system status. The supported LEDs, as well as what they indicate, are listed and described in Table 3-1 below: 116 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Table 3-1. Cisco Catalyst 6500 Ethernet Module LED Status LED Color or State Description Status Green The color green indicates that all diagnostics have passed and that the module is operational. The color orange indicates that the module is booting or running diagnostics. This color could also indicate that an over temperature condition (i.e., a minor temperature threshold) has been exceeded during environmental monitoring. The color red indicates that the module is resetting. The switch has just been powered on or the module has been hot inserted during the normal initialization sequence. The color red could also indicate that an over temperature condition (i.e., a major temperature threshold) has been exceeded during environmental monitoring. If the module fails to download code and configuration information successfully during the initial reset, the LED stays red; the module does not come online. If the status LED is off, this indicates that the module is not receiving power or has been powered off. Orange Red Off LINK Green Orange Flashing Orange Off PHONE Green Off The color green indicates that the port is active and the link is connected and operational. The color orange indicates that the port is disabled through the CLI command (i.e., the shutdown command is configured). A flashing orange LED indicates that the port is faulty and has been disabled by the system. If the link LED is off, then this indicates that the port is not active or the link is not connected. It does not mean no cable is connected. If the phone LED is green, then this indicates that the voice daughter card is installed. If the phone LED is off, then this indicates that the voice daughter card is not detected or is not installed. Another popular Catalyst switch, the Catalyst 4500 Series modular switch, also has LEDs that can be used for link and system troubleshooting. The Ethernet module LEDs, as well as the meanings of these LEDs, are listed and described in Table 3-2 below: 117 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Table 3-2. Cisco Catalyst 4500 Ethernet Module LED Status LED Color or State Status Green Red Orange Link Green Orange Flashing Orange Off Port Status Green Orange Flashing Orange Off Description The color green indicates that all diagnostic tests have passed. The color red indicates that a test, other than an individual port test, has failed. The color orange indicates system boot-up, that self-test diagnostics are running, or that the module is disabled. The color green indicates that the port is operational and that a signal has been detected. The color orange indicates that the link has been disabled by software (i.e., the shutdown command has been issued). A flashing orange LED indicates that the link has been disabled due to a hardware failure. If the link LED is off, then this means no signal has been detected. It does not necessarily mean that the port is not connected. The color green indicates that the port is operational and that a signal has been detected. The color orange indicates that the link or port has been disabled by software. A flashing orange LED indicates that the link has been disabled due to a hardware failure. If the LED is off, then this indicates that no signal is detected. In addition to understanding what the different LED colors mean, it is also important to have an understanding of what action to take to remedy the issue. For example, assume that you are troubleshooting a Catalyst 6500 Series switch and you notice that the status LEDs on the supervisor engine (or any switching modules) is red or off. In such cases, it is possible that the module might have shifted out of its slot, or, in the event of a new module, was not correctly inserted into the chassis. In this case, the recommended action is to reseat the module. In some cases, it also may be necessary to reboot the entire system. While a link or port LED color other than green typically indicates some kind of failure or other issue, it is important to remember that a green link light does not always mean that the cable is fully functional. For example, a single broken wire or one shutdown port can cause the problem of one side showing a green link light while the other side does not. This could be because the cable encountered physical stress that causes it to be functional at a marginal level. In such cases, the CLI can be used to perform additional troubleshooting. 118 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Using the Command Line Interface to Troubleshoot Link Issues Several Command Line Interface (CLI) commands can be used to troubleshoot Layer 1 issues on Cisco IOS Catalyst switches. Commonly used commands include the show interfaces, the show controllers, and the show interface [name] counters errors commands. In addition to knowing these commands, you also are required to be able to interpret accurately the output or information that these commands provide. The show interfaces command is a powerful troubleshooting tool that provides a plethora of information, which includes the following: • The administrative status of a switching port • The port operational state • The media type (for select switches and ports) • Port input and output packets • Port buffer failures and port errors • Port input and output errors • Port input and output queue drops Following is the output of the show interfaces command for a GigabitEthernet switch port: Catalyst-3750-1#show interfaces GigabitEthernet3/0/1 GigabitEthernet0/1 is up, line protocol is down (notconnect) Hardware is GigabitEthernet, address is 000f.2303.2db1 (bia 000f.2303.2db1) MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA, Loopback not set Keepalive not set Auto-duplex, Auto-speed, link type is auto, media type is unknown input flow-control is off, output flow-control is desired ARP type: ARPA, ARP Timeout 04:00:00 Last input never, output never, output hang never Last clearing of “show interface” counters never Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0 Queueing strategy: fifo Output queue: 0/40 (size/max) 5 minute input rate 0 bits/sec, 0 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 0 packets input, 0 bytes, 0 no buffer Received 0 broadcasts (0 multicasts) 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 watchdog, 0 multicast, 0 pause input 0 input packets with dribble condition detected 0 packets output, 0 bytes, 0 underruns 0 output errors, 0 collisions, 1 interface resets 119 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 0 babbles, 0 late collision, 0 deferred 0 lost carrier, 0 no carrier, 0 PAUSE output 0 output buffer failures, 0 output buffers swapped out Most Cisco Catalyst switch ports default to the ‘notconnect’ state as illustrated in the first line of the output printed by this command. However, a port can also transition to this state if a cable is removed from the port or is not correctly connected. This status is also reflected when the connected cable is faulty or when the other end of the cable is not connected to an active port or device (e.g., if a workstation connected to the switch port is powered off ). NOTE: When troubleshooting GigabitEthernet ports, this port status may also be a result of incorrect gigabit interface converters (GBICs) being used between the two ends. The first part of the output in the first line printed by this command (i.e., [interface] is up) refers to the Physical layer status of the particular interface. The second part of the output (i.e., line protocol is down) indicates the Data Link layer status of the interface. If this indicates an ‘up’, then it means that the interface can send and receive keepalives. Keep in mind that it is possible for the switch port to indicate that the Physical layer is up, while the Data Link layer is down, for example, such as when the port is a SPAN destination port or if the local port is connected to a CatOS switch with its port disabled. The Input queue indicates the actual number of frames dropped because the maximum queue size was exceeded. The flushes column counts Selective Packet Discard (SPD) drops on the Catalyst 6000 Series switches. SPD drops low-priority packets when the CPU is overloaded in order to save some processing capacity for high-priority packets. The flushes counter in the show interfaces command output increments as part of SPD, which implements a selective packet drop policy on the IP process queue of the router. Therefore, it applies only to process-switched traffic. The different Cisco IOS switching methods will be described again later in this guide. The Total output drops indicates the number of packets dropped because the output queue is full. This is often seen when traffic from multiple inbound high bandwidth links (e.g., GigabitEthernet links) is being switched to a single outbound lower bandwidth link (e.g., a FastEthernet link). The output drops increment because the interface is overwhelmed by the excess traffic due to the speed mismatch between the inbound and outbound bandwidths. In addition to the show interfaces command, the show interfaces [name] counters errors command can also be used to view interface errors and facilitate Layer 1 troubleshooting. Following is the output that is printed by the show interfaces [name] counters errors command: 120 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Catalyst-3750-1#show interfaces GigabitEthernet3/0/1 counters errors Port Gi3/0/1 Port Gi3/0/1 Align-Err 0 FCS-Err 0 Single-Col Multi-Col 0 0 Xmit-Err 0 Rcv-Err UnderSize 0 0 Late-Col Excess-Col Carri-Sen 0 0 0 Runts 0 Giants 0 The following section describes some of the error fields included in the output of the show interfaces [name] counters errors command and which issues or problems are indicated by non-zero values under these fields. The Align-Err field reflects a count of the number of frames received that do not end with an even number of octets and that have a bad cyclic redundancy check (CRC). These errors are usually the result of a duplex mismatch or a physical problem, such as cabling, a bad port, or a bad network interface controller (NIC). When the cable is first connected to the port, some of these errors can occur. In addition, if there is a hub connected to the port, collisions between other devices on the hub can cause these errors. The FCS-Err field reflects the number of valid-size frames with Frame Check Sequence (FCS) errors but no framing errors. This is typically a physical issue, such as cabling, a bad port, or a bad NIC. Additionally, a non-zero value under this field could indicate a duplex mismatch. A non-zero value in the Xmit-Err field is an indication that the internal send (Tx) buffer is full. This is commonly seen when traffic from multiple inbound high bandwidth links (e.g., GigabitEthernet links) is being switched to a single outbound lower bandwidth link (i.e., a FastEthernet link), for example. The Rcv-Err field indicates the sum of all receive errors. This counter is incremented when the interface receives an error such as a runt, a giant, or an FCS, for example. The Undersize field is incremented when the switch receives frames that are smaller than 64 bytes in length. This is commonly caused by a faulty sending device. The various collisions fields indicate collisions on the interface. This is common for half-duplex Ethernet, which is almost non-existent in modern networks. However, these counters should not increment for full-duplex links. In the event that non-zero values are present under these counters, this typically indicates a duplex mismatch issue. When a duplex mismatch is detected, the switch prints a message similar to the following on the console or in the log: 121 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on FastEthernet0/1 (not full duplex), with R2 FastEthernet0/0 (full duplex) As will be described in the section pertaining to Spanning Tree Protocol (STP), duplex mismatches can cause STP loops in the switched network if a port is connected to another switch. These mismatches can be resolved by manually configuring the speed and the duplex of switch ports. The Carri-Sen (carrier sense) counter increments every time an Ethernet controller wants to send data on a half-duplex connection. The controller senses the wire and ensures that it is not busy before transmitting. A non-zero value under this field indicates that the interface is operating in half-duplex mode. This is normal for half-duplex. Non-zero values can also be seen under the Runts field due to a duplex mismatch or because of other Physical layer problems, such as a bad cable, port, or NIC on the attached device. Runts are received frames with a bad CRC that are smaller than the minimum IEEE 802.3 frame size, which is 64 bytes for Ethernet. Finally, the Giants counter is incremented when frames are received that exceed the IEEE 802.3 maximum frame size, which is 1518 bytes for non-jumbo Ethernet, and that have a bad FCS. For ports or interfaces connected to a workstation, a non-zero value in this counter is typically caused by a bad NIC on the connected device. However, for ports or interfaces that are connected to another switch (e.g., via a trunk link), this field will contain a non-zero value if 802.1Q encapsulation is used. With 802.1Q, the tagging mechanism implies a modification of the frame because the trunking device inserts a 4-byte tag and then re-computes the FCS. Inserting a 4-byte tag into a frame that already has the maximum Ethernet size creates a 1522-byte frame that can be considered a baby giant frame by the receiving equipment. Therefore, while the switch will still process such frames, this counter will increment and contain a non-zero value. To resolve this issue, the 802.3 committee created a subgroup called 802.3ac to extend the maximum Ethernet size to 1522 bytes; however, it is not uncommon to see a non-zero value under this field when using 802.1Q trunking. Finally, the show controllers ethernet-controller <interface> command can also be used to display traffic counter and error counter information similar to that printed by the show interfaces and show interfaces <name> counters errors commands. Following is the output of the show controllers ethernet-controller <interface> command: 122 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Catalyst-3750-1#show controllers ethernet-controller GigabitEthernet3/0/1 Transmit GigabitEthernet3/0/1 4069327795 Bytes 559424024 Unicast frames 27784795 Multicast frames 7281524 Broadcast frames 0 Too old frames 0 Deferred frames 0 MTU exceeded frames 0 1 collision frames 0 2 collision frames 0 3 collision frames 0 4 collision frames 0 5 collision frames 0 6 collision frames 0 7 collision frames 0 8 collision frames 0 9 collision frames 0 10 collision frames 0 11 collision frames 0 12 collision frames 0 13 collision frames 0 14 collision frames 0 15 collision frames 0 Excessive collisions 0 Late collisions 0 VLAN discard frames 0 Excess defer frames 264522 64 byte frames 99898057 127 byte frames 76457337 255 byte frames 4927192 511 byte frames 21176897 1023 byte frames 127643707 1518 byte frames 264122631 Too large frames 0 Good (1 coll) frames 0 Good (>1 coll) frames Receive 3301740741 Bytes 376047608 Unicast frames 1141946 Multicast frames 1281591 Broadcast frames 429934641 Unicast bytes 226764843 Multicast bytes 137921433 Broadcast bytes 0 Alignment errors 0 FCS errors 0 Oversize frames 0 Undersize frames 0 Collision fragments 257477 259422986 51377167 41117556 2342527 5843545 0 0 Minimum size frames 65 to 127 byte frames 128 to 255 byte frames 256 to 511 byte frames 512 to 1023 byte frames 1024 to 1518 byte frames Overrun frames Pause frames 0 0 18109887 0 0 Symbol error frames Invalid frames, too large Valid frames, too large Invalid frames, too small Valid frames, too small 0 0 0 0 Too old frames Valid oversize frames System FCS error frames RxPortFifoFull drop frame NOTE: The output above will vary slightly depending on the switch platform on which this command is executed. For example, Catalyst 3550 Series switches also include a Discarded frames field, which shows the total number of frames whose transmission attempt is abandoned due to insufficient resources. A large number in this field typically indicates a network congestion issue. In the output above, you would look at the RxPortFifoFull drop frame field, which indicates the total number of frames received on an interface that are dropped because the ingress queue is full. 123 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I VLAN, VTP, AND TRUNKING OVERVIEW A virtual LAN (VLAN) is a logical grouping of hosts that appear to be on the same LAN, regardless of their physical location. VLANs increase the number of Broadcast domains in a switched network, while reducing their overall size. A VLAN can span a single switch or even multiple switches, depending on the implementation. Troubleshooting intra-VLAN and inter-VLAN connectivity issues is therefore an all-encompassing task that should take several elements into consideration. These elements are described later in the ‘Troubleshooting VLANs’ section. Catalyst switches support two types of switch VLAN ports, which are access ports and trunk ports. These port types are described in the following section. Access ports are switch ports that are assigned to a single VLAN. These ports can belong to only a single VLAN. Switch access ports are typically used to connect network hosts, such as printers, computers, IP phones, and wireless access points, to the LAN switch. However, access ports can also be used to provide connectivity between users connected across multiple switches. Such a topology or implementation is illustrated in the diagram shown in Figure 3-1 below: Fig. 3-1. Implementing LANs Using Access Ports and a Single VLAN While such an implementation will work, assuming all the users are on the same subnet, you should keep in mind that the entire switched LAN becomes a single Broadcast domain. The same situation would be applicable even if multiple VLANs were used, as illustrated in the network diagram shown in Figure 3-2 below: Fig. 3-2. Implementing LANs Using Access Ports and Multiple VLANs 124 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 While this configuration and implementation is valid, keep in mind that such a solution is not scalable, especially in larger networks with multiple subnets and devices. In addition, such implementations make troubleshooting end-to-end VLAN connectivity issues very difficult to isolate and identify. Large networks should use multiple VLANs, reducing the size of the Broadcast domains in the switched network and limiting the range of the VLANs. VLAN trunks are used to carry data from multiple VLANs. In order to differentiate frames from one VLAN with those from another, all frames sent across a trunk link are specially tagged so that the destination switch knows to which VLAN the frame belongs. This allows for end-to-end VLAN connectivity across multiple switches but requires either a router or other inter-VLAN solution to facilitate communication between the VLANs. The following two methods can be used to ensure that VLANs that traverse switch trunk links can be uniquely identified: • • Inter-Switch Link IEEE 802.1Q Inter-Switch Link (ISL) is a Cisco proprietary protocol that is used to preserve the source VLAN identification information for frames that traverse trunk links. Although ISL is a Cisco proprietary protocol, it is not supported on all Cisco platforms. For example, Catalyst 2940 and 2950 Series switches only support 802.1Q trunking; they do not support ISL trunking. 802.1Q is an IEEE standard for VLAN tagging. Unlike ISL, 802.1Q (or, more commonly, dot1q) inserts a single 4-byte tag into the original frame between the Source Address (SA) field and the Type or Length fields, depending on the Ethernet frame type. For this reason, 802.1Q is also referred to as a one-level, internal tagging or single tagging mechanism. Given that the length of the 802.1Q tag is 4 bytes, the resulting Ethernet frame can be as large as 1522 bytes, while the minimum size of the Ethernet frame with 802.1Q tagging is 68 bytes. The VLAN Trunking Protocol (VTP) is a Cisco proprietary Layer 2 messaging protocol that manages the addition, deletion, and renaming of VLANs on a network-wide scale. VTP allows VLAN information to propagate through the switched network, which reduces administration overhead in a switched network, while enabling switches to exchange and maintain consistent VLAN information. In order to do this, switches must reside within the same VTP domain. A VTP domain consists of adjacent connected switches that are part of the same management domain. A switch can belong to only one VTP domain at any one time and will reject or drop any VTP packets received from switches in any other VTP domains. In order to participate in the VTP domain, switches must be configured for a specific VTP mode, each with its own characteristics. A switch can be configured in one of three VTP modes: server, client, and transparent. 125 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I VTP server mode is the default VTP mode for all Cisco Catalyst switches. VTP server switches control VLAN creation, modification, and deletion for their respective VTP domain. VTP clients advertise and receive VTP information; however, they do not allow VLAN creation, modification, or deletion. While a switch that is configured for VTP transparent allows for the creation, modification, and deletion of VLANs in the same manner as on a VTP server switch, it is different in that it ignores VTP updates, and all VLANs that are created on the switch are locally significant and are not propagated to other switches in the VTP domain. TROUBLESHOOTING VLANS In the previous section, we discussed the use of three CLI commands that can be used for the troubleshooting of Physical Layer issues. This section describes some common approaches to identifying and troubleshooting intra-VLAN connectivity issues. Some of the more common causes of intra-VLAN connectivity issues include the following: • Duplex mismatches • Bad NIC or cable • Congestion • Hardware issues • Software issues • Resource oversubscription • Configuration issues Duplex mismatches can result in very slow network performance and connectivity. While some improvements in auto negotiation have been made, and the use of auto negotiation is considered a valid practice, it is still possible for duplex mismatches to occur. As an example, when the NIC is set to 100/Full and the switch port is auto negotiating, the NIC will retain its 100/Full setting, but the switch port will be set to 100/Half. Another example would be the inverse; that is, the NIC is set to auto negotiate, while the switch port is set to 100/Full. In that case, the NIC would auto negotiate to 100/Half, while the switch retained its static 100/Full configuration, resulting in a duplex mismatch. It is therefore good practice to specify manually speed and duplex settings for 10/100 Ethernet connections, where feasible, to avoid duplex mismatches with auto negotiation. Duplex mismatches can affect not only users directly connected to the switch but also network traffic that traverses inter-switch links that have mismatched duplex settings. The port interface speed and duplex settings can be viewed using the show interfaces command. 126 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 NOTE: Because Catalyst switches support only full duplex for 1Gbps links, this is not commonly an issue for GigabitEthernet connections. Multiple counters in Cisco IOS software can be used to identify a potentially bad NIC or cabling issue. NIC or cabling issues can be identified by checking the values of certain counters in different show commands. For example, if the switch port counters show an incrementing number of frames with a bad CRC or with FCS errors, this can most likely be attributed either to a bad NIC on the workstation or machine or to a bad network cable. Network congestion can also cause intermittent connectivity issues in the switched network. The first sign that your VLAN is overloaded is if the Rx or Tx buffers on a port are oversubscribed. Additionally, excessive frame drops on a port can also be an indication of network congestion. A common cause of network congestion is due to underestimating aggregate bandwidth requirements for backbone connections. In such cases, congestion issues can be resolved by configuring EtherChannels or by adding additional ports to existing EtherChannels. While network congestion is a common cause of connectivity issues, it is also important to know that the switch itself can experience congestion issues, which can have a similar impact on network performance. Limited switch bandwidth can result in congestion issues, which can severely impact network performance. As you may recall, in the SWITCH guide we learned that in LAN switching, bandwidth refers to the capacity of the switch fabric. Therefore, if the switch fabric is on 5Gbps and you attempt to push 7Gbps worth of traffic through the switch, the end result is packet loss and poor network performance. This is a common issue in oversubscribed platforms, where the aggregate capacity of all ports can exceed the total backplane capacity. Hardware problems can also cause connectivity issues in the switched LAN. Examples of such issues include bad ports or bad switch modules. While you could troubleshoot such issues by looking at physical indicators such as LEDs, if possible, such issues are sometimes difficult to troubleshoot and diagnose. In most cases, you should seek the assistance of the Technical Assistance Center (TAC) when you suspect potentially faulty hardware issues. Software bugs are even more difficult to identify because they cause deviation, which is hard to troubleshoot. In the event that you suspect a software bug may be causing connectivity issues, you should contact the TAC with your findings. Additionally, if error messages are printed on the console or are in the logs, you can also use some of the online tools available from Cisco to implement a workaround or get a recommendation for a version of software in which the issue has been resolved and verified. 127 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I As with any other hardware device, switches have limited resources, such as physical memory. When these resources are oversubscribed, this can lead to severe performance issues. Issues such as high CPU utilization can have a drastic impact on both switch and network performance. Resource troubleshooting for IOS switches is described in the following chapter. Finally, as with any other technology, incorrect configurations may also cause connectivity issues either directly or indirectly. For example, the poor placement of the root bridge may result in slow connectivity for users. Directly integrating or adding an incorrectly configured switch into the production network could result in an outright outage for some or all users. The following sections describe some common VLAN-related issues, their probable causes, and the actions that can be taken to remedy them. Troubleshoo ng Dynamic VLAN Adver sement Cisco Catalyst switches use VTP to propagate VLAN information dynamically throughout the switched domain. VTP is a Cisco proprietary Layer 2 messaging protocol that manages the addition, deletion, and renaming of VLANs for switches in the same VTP domain. There are several reasons why a switch might not be able to receive any VLAN information dynamically when added to the VTP domain. Some common causes include the following: • Layer 2 trunking misconfigurations • Incorrect VTP configuration • Configuration revision number • Physical layer issues • Software or hardware issues or bugs • Switch performance issues NOTE: For brevity, only trunking, VTP configuration, and the configuration revision number are described in the following section. Physical layer troubleshooting was described in the previous section. Software or hardware issues or bugs and switch performance issues will be described in the following chapter. In order for switches to exchange VLAN information using VTP, a trunk must be established between the switches. Cisco IOS switches support both ISL and 802.1Q trunking mechanisms. While some switches default to ISL, which is a Cisco proprietary trunking mechanism, the current Cisco IOS Catalyst switches default to 802.1Q. When provisioning trunking between switches, it is considered good practice to specify manually the trunking encapsulation protocol. This is accomplished using the switchport trunk encapsulation [isl|dot1q] interface configuration command when configuring the link as a trunk port. 128 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 There are several commands that you can use to troubleshoot trunk connectivity issues. You can use the show interfaces command to verify basic port operational and administrative status. Additionally, you can append the trunk or errors keyword to perform additional troubleshooting and verification. The show interfaces [name] counters trunk command can be used to view the number of frames transmitted and received on trunk ports. The output of this command also includes encapsulation errors, which can be used to verify 802.1Q and ISL, and trunking encapsulation mismatches as illustrated in the following output: Cat-3550-1#show interfaces FastEthernet0/12 counters trunk Port Fa0/12 TrunkFramesTx 1696 TrunkFramesRx 32257 WrongEncap 0 Referencing the output above, you can repeat the same command to ensure that both the Tx and Rx columns are incrementing and perform additional troubleshooting from there. For example, if the switch is not sending any frames, then the interface might not be configured as a trunk, or it might be down or disabled. If the Rx column is not incrementing, then it may be that the remote switch might not be configured correctly. Another command that can be used to troubleshoot possible Layer 2 trunk misconfigurations is the show interfaces [name] trunk command. The output of this command includes the trunking encapsulation protocol and mode, the native VLAN for 802.1Q, the VLANs that are allowed to traverse the trunk, the VLANs that are active in the VTP domain, and the VLANs that are pruned. A common issue with VLAN propagation is that the upstream switch has been configured to filter certain VLANs on the trunk link using the switchport trunk allowed vlan interface configuration command. Following is the output of the show interfaces [name] trunk command: Cat-3550-1#show interfaces trunk Port Fa0/12 Fa0/13 Fa0/14 Fa0/15 Mode desirable desirable desirable desirable Encapsulation n-802.1q n-802.1q n-isl n-isl Port Fa0/12 Fa0/13 Fa0/14 Fa0/15 Vlans allowed on trunk 1-4094 1-4094 1-4094 1-4094 Port Vlans allowed and active in management domain 129 Status trunking trunking trunking trunking Native vlan 1 1 1 1 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Fa0/12 Fa0/13 Fa0/14 Fa0/15 1-4 1-4 1-4 1-4 Port Fa0/12 Fa0/13 Fa0/14 Fa0/15 Vlans in spanning tree forwarding state and not pruned 1-4 none none none Another common trunking misconfigurations issue is native VLAN mismatches. When you are configuring 802.1Q trunks, the native VLAN must match on both sides of the trunk link; otherwise, the link will not work. If there is a native VLAN mismatch, then STP places the port in a port VLAN ID (PVID) inconsistent state and will not forward on the link. In such cases, an error message similar to the following will be printed on the console or in the log: *Mar 1 03:16:43.935: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent peer vlan id 1 on FastEthernet0/11 VLAN2. *Mar 1 03:16:43.935: %SPANTREE-2-BLOCK_PVID_PEER: Blocking FastEthernet0/11 on VLAN0001. Inconsistent peer vlan. *Mar 1 03:16:43.935: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking FastEthernet0/11 on VLAN0002. Inconsistent local vlan. *Mar 1 03:16:43.935: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent peer vlan id 1 on FastEthernet0/12 VLAN2. *Mar 1 03:16:43.935: %SPANTREE-2-BLOCK_PVID_PEER: Blocking FastEthernet0/12 on VLAN0001. Inconsistent peer vlan. *Mar 1 03:16:43.939: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking FastEthernet0/12 on VLAN0002. Inconsistent local vlan. While STP troubleshooting will be described later in this section, this inconsistent state could be validated using the show spanning-tree command as illustrated below: Cat-3550-1#show spanning-tree interface FastEthernet0/11 Vlan ------------------VLAN0001 VLAN0002 Role ---Desg Desg Sts Cost --- --------BKN*19 BKN*19 Prio.Nbr -------128.11 128.11 Type ---------------------------P2p *PVID_Inc P2p *PVID_Inc If you have checked and validated that the trunk is indeed correctly configured and operational between the two switches, then the next step would be to validate VTP configuration parameters. These parameters include the VTP domain name, the correct VTP mode, and the VTP password, if one has been configured for the domain, using the show vtp status and show vtp password commands, respectively. Below is the output of the show vtp status command: 130 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Cat-3550-1#show vtp status VTP Version Configuration Revision Maximum VLANs supported locally Number of existing VLANs VTP Operating Mode VTP Domain Name VTP Pruning Mode VTP V2 Mode VTP Traps Generation MD5 digest : : : : : : : : : : running VTP2 0 1005 8 Server TSHOOT Enabled Enabled Disabled 0x26 0x99 0xB7 0x93 0xBE 0xDA 0x76 0x9C ... [Truncated Output] When using the show vtp status command, ensure that the switches are running the same version of VTP. By default, Catalyst switches run VTP version 1. A switch running VTP version 1 cannot participate in a VTP version 2 domain. If the switch is incapable of running VTP version 2, then all VTP version 2 switches should be configured to run version 1 instead using the vtp version global configuration command. NOTE: If you change the VTP version on the server, then the change is propagated automatically to client switches in the VTP domain. As was described in the SWITCH guide, VTP propagation is enabled for VTP client/server or server/server devices. If VTP is disabled on a switch (i.e., transparent mode), then the switch will not receive VLAN information dynamically via VTP. However, be mindful of the fact that with version 2, transparent mode switches will forward received VTP advertisements out of their trunk ports and act as VTP relays. This happens even if the VTP version is not the same. The VTP domain name should also be consistent on the switches. Finally, the output of the show vtp status command also includes the MD5 hash used for authentication purposes. This hash, which is derived from the VTP domain name and password, should be consistent on all switches in the domain. If the VTP passwords or domain names are different on the switches, then the calculated MD5 will also be different. If the domain name or password is different, then the show vtp status command will indicate an MD5 digest checksum mismatch as illustrated in the following output: Cat-3550-1#show vtp status VTP Version : running VTP2 Configuration Revision : 0 Maximum VLANs supported locally : 1005 131 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Number of existing VLANs : 8 VTP Operating Mode : Server VTP Domain Name : TSHOOT VTP Pruning Mode : Enabled VTP V2 Mode : Enabled VTP Traps Generation : Disabled MD5 digest : 0x26 0x99 0xB7 0x93 0xBE 0xDA 0x76 0x9C *** MD5 digest checksum mismatch on trunk: Fa0/11 *** *** MD5 digest checksum mismatch on trunk: Fa0/12 *** ... [Truncated Output] Finally, the configuration revision number can wreak havoc when using VTP. Switches use the configuration revision number to keep track of the most recent information in the VTP domain. Every switch in the domain stores the configuration revision number that it last heard from a VTP advertisement and this number is incremented every time new information is received. When any switch in the VTP domain receives an advertisement message with a higher configuration revision number than its own, it will overwrite any stored VLAN information and synchronize its own stored VLAN information with the information received in the advertisement message. Therefore, if you are wondering why the switch that you integrated into the VTP domain is not receiving any VLAN information, it may be that the same switch had a higher configuration revision number and caused all other switches to overwrite their local VLAN information and replace it with the information received in the advertisement message from the new switch. To avoid such situations, always ensure that the configuration revision number is set to 0 prior to integrating a new switch into the domain. This can be done by changing the VTP mode or changing the VTP domain name on the switch. The configuration revision number is included in the output of the show vtp status command. Troubleshoo ng Loss of End-to-End Intra-VLAN Connec vity There are several possible reasons for a loss of end-to-end connectivity within a VLAN. Some of the most common causes include the following: • Physical layer issues • VTP pruning • VLAN trunk filtering • New switches • Switch performance issues • Network congestion • Software or hardware issues or bugs 132 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 NOTE: For brevity, only trunking, VTP pruning, trunk filtering, and the integration of new switches into the domain will be described in this section. Software or hardware issues or bugs and switch performance issues are described in the following chapter. Physical layer troubleshooting was described earlier in this chapter. VTP pruning removes VLANs from the VLAN database of the local switch when no local ports are a part of that VLAN. VTP pruning increases the efficiency of trunks by eliminating unnecessary Broadcast, Multicast, and unknown traffic from being flooded across the network. While VTP pruning is a desirable feature to implement, incorrect configuration or implementation can result in a loss of end-to-end VLAN connectivity. VTP pruning should be enabled only in client/server environments. Implementing pruning in a network that includes transparent mode switches may result in a loss of connectivity. If one or more switches in the network are in VTP transparent mode, you should either globally disable pruning for the entire domain or ensure that all VLANs on the trunk link(s) to the upstream transparent mode switch(es) are pruning ineligible (i.e., they are not pruned), using the switchport trunk pruning vlan interface configuration command under the applicable interfaces. In addition to VTP pruning, incorrectly filtering VLANs on switch trunk links can result in a loss of end-to-end VLAN connectivity. By default, all VLANs are allowed to traverse all trunk links; however, Cisco IOS software allows administrators to remove (or add) VLANs selectively to specific trunk links using the switchport trunk allowed vlan interface configuration command. Use the show interfaces [name] trunk and the show interfaces [name] switchport commands to view pruned and restricted VLANs on trunk links. Following is the output of the show interfaces [name] trunk command: Cat-3550-1#show interfaces trunk Port Fa0/1 Fa0/2 Mode on on Encapsulation Status 802.1q trunking 802.1q trunking Port Fa0/1 Fa0/2 Vlans allowed on trunk 1,10,20,30,40,50 1-99,201-4094 Port Fa0/1 Fa0/2 Vlans allowed and active in management domain 1,10,20,30,40,50 1,10,20,30,40,50,60,70,80,90,254 Port Fa0/1 Fa0/2 Vlans in spanning tree forwarding state and not pruned 1,10,20,30,40,50 1,40,50,60,70,80,90,254 133 Native vlan 1 1 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Following is the output of the show interfaces [name] switchport command on a port that has been configured statically as an 802.1Q trunk link: Cat-3550-2#show interfaces FastEthernet0/7 switchport Name: Fa0/7 Switchport: Enabled Administrative Mode: trunk Operational Mode: trunk Administrative Trunking Encapsulation: dot1q Operational Trunking Encapsulation: dot1q Negotiation of Trunking: On Access Mode VLAN: 1 (default) Trunking Native Mode VLAN: 1 (default) Administrative Native VLAN tagging: enabled Voice VLAN: none Administrative private-vlan host-association: none Administrative private-vlan mapping: none Administrative private-vlan trunk native VLAN: none Administrative private-vlan trunk Native VLAN tagging: enabled Administrative private-vlan trunk encapsulation: dot1q Administrative private-vlan trunk normal VLANs: none Administrative private-vlan trunk associations: none Administrative private-vlan trunk mappings: none Operational private-vlan: none Trunking VLANs Enabled: 3,5,7 Pruning VLANs Enabled: 2-8 Capture Mode Disabled Capture VLANs Allowed: ALL Protected: false Unknown unicast blocked: disabled Unknown multicast blocked: disabled Appliance trust: none As was described in the previous section, the integration of a new switch into the network can result in a loss of VLAN information in the management domain. This loss of VLAN information can result in a loss of connectivity between devices within the same VLAN. Ensure that the configuration revision number is reset prior to integrating a new switch into the LAN. USING THE ‘SHOW VLAN’ COMMAND In addition to the commands that were described in the previous sections, there are additional Cisco IOS software commands that are useful for both verifying and troubleshooting VLAN configurations. One of the most commonly used VLAN verification and troubleshooting commands is the show vlan command. This command displays parameters for all VLANs within the administrative domain as illustrated in the following output: 134 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Cat-3550-1#show vlan VLAN Name Status Ports ---- -------------------------------- --------- ------------------------------1 default active Fa0/11, Fa0/12, Fa0/13, Fa0/14 Fa0/20, Fa0/21, Fa0/22, Fa0/23 Fa0/24 150 VLAN_150 active Fa0/2, Fa0/3, Fa0/4, Fa0/5 Fa0/6, Fa0/7, Fa0/8, Fa0/9 Fa0/10 160 VLAN_160 active Fa0/15, Fa0/16, Fa0/17, Fa0/18 Fa0/19 170 VLAN_170 active Gi0/1, Gi0/2 1002 fddi-default active 1003 token-ring-default active 1004 fddinet-default active 1005 trnet-default active VLAN ---1 150 160 170 1002 1003 1004 1005 Type ----enet enet enet enet fddi tr fdnet trnet SAID ---------100001 100150 100160 100170 101002 101003 101004 101005 MTU ----1500 1500 1500 1500 1500 1500 1500 1500 Parent ------ RingNo ------ BridgeNo -------- Stp ---ieee ibm BrdgMode -------- Trans1 -----0 0 0 0 0 0 0 0 Trans2 -----0 0 0 0 0 0 0 0 Remote SPAN VLANs ------------------------------------------------------------------------------ Primary Secondary Type Ports ------- --------- ----------------- ------------------------------------------ This command prints all available VLANs along with the ports that are assigned to each of the individual VLANs. Only access ports, regardless of whether they are up or down, will be included in the output of this command. Trunk links will not be included, as these belong to all VLANs. The show vlan command also provides information on RSPAN VLANs, as well as Private VLAN (PVLAN) configuration on the switch. The show vlan command can be used with additional keywords to provide information that is more specific. The following output displays the supported additional keywords that can be used with this command: Cat-3550-1#show brief id ifindex name private-vlan vlan ? VTP all VLAN status in brief VTP VLAN status by VLAN id SNMP ifIndex VTP VLAN status by VLAN name Private VLAN information 135 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I remote-span summary | <cr> Remote SPAN VLANs VLAN summary information Output modifiers The brief field prints a brief status of all active VLANs. The output that is printed by this command is the same as the output above, with the only difference being that the last two sections will be omitted. The id field provides the same information as the show vlan command, but only for the specified VLAN as shown in the following output: Switch-1#show vlan id 150 VLAN Name Status Ports ---- -------------------------------- --------- ------------------------------150 VLAN_150 active Fa0/1, Fa0/2, Fa0/3, Fa0/4 Fa0/5, Fa0/6, Fa0/7, Fa0/8 Fa0/9, Fa0/10 VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2 ---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ---150 enet 100150 1500 0 0 Remote SPAN VLAN ---------------Disabled Primary Secondary Type Ports ------- --------- ----------------- ---------------------------------------- Again, the VLAN name is included in the output, as are all of the access ports that belong to the VLAN. Trunk ports are not included in this output because they belong to all VLANs. Additional information also includes the VLAN MTU, RSPAN configuration (if applicable), and PVLAN configuration parameters (if applicable). The name field allows the VLAN name to be specified instead of the ID. This command prints the same information as the show vlan id <number> command. The ifindex field displays the SNMP IfIndex for the VLAN (if applicable), while the private-vlan and remote-span fields print PVLAN and RSPAN configuration information, respectively. Finally, the summary field prints a summary of the number of VLANs that are active in the management domain. This includes standard and extended VLANs. Another useful VLAN troubleshooting command is the show vtp counters command. This command prints information on VTP packet statistics. Following is the output of the show vtp counters command on a switch configured as a VTP server (default): 136 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Cat-3550-1#show vtp counters VTP statistics: Summary advertisements received Subset advertisements received Request advertisements received Summary advertisements transmitted Subset advertisements transmitted Request advertisements transmitted Number of config revision errors Number of config digest errors Number of V1 summary errors : : : : : : : : : 15 10 2 19 12 0 0 0 0 VTP pruning statistics: Trunk Join Transmitted Join Received Summary advts received from non-pruning-capable device ------------- ---------------- ---------------- --------------------------Fa0/11 0 1 0 Fa0/12 0 1 0 The first six lines of the output printed by the show vtp counters command provide the statistics for the three types of VTP packets: advertisement requests, summary advertisements, and subset advertisements. These different messages will be described in the following section. VTP advertisement requests are requests for configuration information. These messages are sent by VTP clients to VTP servers to request VLAN and VTP information they may be missing. A VTP advertisement request is sent out when the switch resets, the VTP domain name changes, or in the event that the switch has received a VTP summary advertisement frame with a higher configuration revision number than its own. VTP servers should show only the received counters incrementing, while any VTP clients should show only the transmitted counters incrementing. VTP summary advertisements are sent out by servers every five minutes, by default. These types of messages are used to tell an adjacent switch of the current VTP domain name the configuration revision number and the status of the VLAN configuration, as well as other VTP information that includes the time stamp, the MD5 hash, and the number of subset advertisements to follow. If these counters are incrementing on the server, then there is more than one switch acting or configured as a server in the domain. VTP subset advertisements are sent out by VTP servers when VLAN configuration changes, such as when a VLAN is added, suspended, changed, deleted, or other VLAN-specific parameters (e.g., VLAN MTU) have changed. One or more subset advertisements will be sent following the VTP summary advertisement. A subset advertisement contains a list of VLAN information. If there are several VLANs, more than one subset advertisement may be required in order to advertise all the VLANs. 137 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The Number of config revision errors field shows the number of advertisements that the switch cannot accept because it received packets with the same configuration revision number but with a different MD5 hash value. This is common when changes are made to two or more server switches in the same domain at the same time and an intermediate switch receives these advertisements at the same time. This concept is illustrated in Figure 3-3 below, which illustrates a basic switched network: Root: VLAN 20, 40 Root: VLAN 10, 30 Sw1 Sw3 Sw2 Sw5 Sw4 Fig. 3-3. Troubleshooting Configuration Revision Number Errors Figure 3-3 illustrates a basic network that incorporates redundancy and load sharing. It should be assumed that Sw1 and Sw2 are configured as servers, while Sw3 is configured as a client. Sw1 is the root for VLANs 10 and 30, while Sw2 is the root for VLANs 20 and 40. Assume that a simultaneous change is implemented on Sw1 and Sw2 adding VLAN 50 to Sw1 and VLAN 60 to Sw2. Both switches send out an advertisement following the change to the database. The change is propagated throughout the domain, overwriting the previous databases of the other switches that receive this information. Assume that Sw5 receives the same information from neighbors at the same time and both advertisements contain the same configuration revision number. In such situations, the switch will not be able to accept either advertisement because they have the same configuration revision number but different MD5 hash values. When this occurs, the switch increments the Number of config revision errors counter and does not update its database. This situation can result in a loss of connectivity within one or more VLANs because VLAN information is not updated on the switch. To resolve this issue and ensure that the local database on the switch is updated, configure a dummy VLAN on one of the server switches, which results in another update with an incremented configuration revision number. This will overwrite the local database of all switches, allowing Sw5 to update its database as well. Keep in mind that this is not a common occurrence; however, it is possible, hence, the reason for this counter. 138 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 The Number of config digest errors counter increments whenever the switch receives an advertisement with a different MD5 hash value than it calculated. This is the result of different VTP passwords configured on the switches. You can use the show vtp password command to verify that the configured VTP password is correct. It is also important to remember that the passwords may be the same but hardware or software issues or bugs could be causing data corruption of VTP packets, resulting in these errors. Finally, the VTP pruning statistics field will only ever contain non-zero values when pruning is enabled for the VTP domain. Pruning is enabled on servers and this configuration is propagated throughout the VTP domain. Servers will receive joins from clients when pruning has been enabled for the VTP domain. SPANNING TREE PROTOCOL OVERVIEW In order to troubleshoot Spanning Tree Protocol (STP), you need to have a solid understanding of the protocol, its inner workings, and its default parameters. This section discusses STP fundamentals, which are an important element when troubleshooting STP issues. Additional detailed information on STP can be found in the current SWITCH guide that is available online. The Spanning Tree Protocol is defined in the IEEE 802.1D standard. The primary purpose of STP is to attempt to provide a loop-free topology in a redundant Layer 2 network environment. The word ‘attempt’ is used because implementing STP does not always guarantee a loop-free switched network. This is because the STP operates by making the following assumptions about the network: 1. All links are bidirectional and can both send and receive BPDUs 2. The switch is able to receive, process, and send BPDUs regularly All switches that reside in the Spanning Tree domain communicate and exchange messages using Bridge Protocol Data Units (BPDUs). The exchange of BPDUs is used by STP to determine the network topology. The topology of an active switched network is determined by the following three variables: 1. The unique MAC address (switch identifier) that is associated with each switch 2. The path cost to the root bridge associated with each switch port 3. The port identifier (MAC address of the port) associated with each switch port Configuration BPDUs are sent by LAN switches and are used to communicate and compute the Spanning Tree topology. After the switch port initializes, the port is placed into the blocking state 139 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I and a BPDU is sent to each port in the switch. By default, all switches initially assume that they are the root of the Spanning Tree until they exchange Configuration BPDUs with other switches. As long as a port continues to see its Configuration BPDU as the most attractive, it will continue sending Configuration BPDUs. Switches determine the best Configuration BPDU based on the following four factors (in the order listed): 1. Lowest root bridge ID 2. Lowest root path cost to root bridge 3. Lowest sender bridge ID 4. Lowest sender port ID The completion of the Configuration BPDU exchange results in the following actions: 1. A root switch is elected for the entire Spanning Tree domain 2. A root port is elected on every non-root switch in the Spanning Tree domain 3. A designated switch is elected for every LAN segment 4. A designated port is elected on the designated switch for every segment 5. Loops in the network are eliminated by blocking redundant paths A Configuration BPDU is always transmitted away from the root bridge and to the rest of the switches within the STP domain. The simplest way to remember the flow of Configuration BPDUs after the Spanning Tree network has converged is to memorize the following four rules: 1. A Configuration BPDU originates on the root bridge and is sent via the designated port 2. A Configuration BPDU is received by a non-root bridge on a root port 3. A Configuration BPDU is transmitted by a non-root bridge on a designated port 4. There is only one designated port (on a designated switch) on any single LAN segment In a stable and ‘healthy’ switched network, the majority of the BPDUs sent by switches should be Configuration BPDUs. However, another type of BPDU, the Topology Change Notification (TCN) BPDU may also be sent by switches. The TCN BPDU plays a key role in handling changes in the active topology. This BPDU is used to inform downstream switches of a change in the Spanning Tree network topology. A switch originates a TCN BPDU in the following two ways: 1. It transitions a port into the Forwarding state and it has at least one designated port 2. It transitions a port from either the Forwarding or Learning states to the Blocking state 140 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Unlike Configuration BPDUs, which are always originated by the root bridge and are received on the root port of a non-root bridge, TCN BPDUs are originated by any switch and are sent upstream toward the root bridge via the root port to alert the root bridge that the active topology has changed. Once the root bridge acknowledges the TCN, it propagates it to all the other switches in the Spanning Tree domain. The Spanning Tree Algorithm (STA) defines a number of states that a port under STP control will progress through before being in an active Forwarding state. These port states are Blocking, Listening, Learning, Forwarding, and Disabled. By default, following initialization, all switches initially assume that they are the root of the Spanning Tree until they exchange BPDUs with other switches. When switches exchange BPDUs, an election is held and the switch with the highest Bridge Priority is elected the STP root bridge. If two or more switches have the same priority, then the switch with the lowest order MAC address is chosen or elected as the root bridge of the STP network. During STP root election, no traffic is forwarded over any switch in the same STP domain. The Spanning Tree Protocol uses cost and priority values to determine the best path to the root bridge. These values are then used in the election of the root port, which will be described in the following section. It is important to understand the calculation of the cost and priority values in order to understand how Spanning Tree selects one port over another, for example. One of the key functions of the STA is to attempt to provide the shortest path to each switch in the network from the root bridge. Once selected, this path is then used to forward data while redundant links are placed into a blocking state. STA uses two values to determine which port will be placed into a Forwarding state (i.e., the best path to the root bridge) and which port(s) will be placed into a Blocking state. These values are the port cost and the port priority. The 802.1D specification assigns 16-bit (short) default port cost values to each port that is based on the port’s bandwidth. Because administrators also have the capability to assign port cost values (between 1 and 65,535) manually, the 16-bit values are used only for ports that have not been configured specifically for port cost. In the event that multiple ports have the same path cost, STP considers the port priority when selecting which port to put into the Forwarding state. The valid port priority range is from 0 to 240 and the Cisco IOS default value is 128. This value can be adjusted manually by the administrator to influence which port is selected by the STA; the lower the numerical number, the more preferred the port. The default port priority is adjusted in increments of 16. 141 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I When all LAN ports have the same priority value, STP will place the port with the lowest port number into the Forwarding state and block the other ports. The port priority is locally significant between two switches. If a switch is connected via multiple links to another switch, then it uses one of the following tie-breaker mechanisms to determine to which port it will be forwarding: • Lowest root bridge ID • Lowest root path cost to root bridge • Lowest sender bridge ID • Lowest sender port ID Spanning Tree BPDUs include several timers that play an integral role in the operation of the protocol. The Spanning Tree timer values are contained in the last three fields of a BPDU. Within the Spanning Tree domain, the only timer values that are important are those that are sent by the root bridge. In other words, non-root bridges are not concerned with locally configured timer values. The default Spanning Tree timers go hand-in-hand with the IEEE 802.1D specification that also recommends a maximum network diameter of seven. The Spanning Tree diameter is the maximum distance that any two single switches can be from each other. A maximum diameter of seven means that two distinct switches cannot be more than seven hops away from each other. This concept will be described in detail later in this chapter. Because all other switches in the Spanning Tree domain use the timer values advertised by the root bridge, the modification of any of these values should always be made at the root bridge. By setting these values in the STP root, these values will be passed (via BPDUs) to other switches in the STP domain. The three configurable Spanning Tree timer values are as follows: 1. The Hello Time 2. The Forward Delay 3. The Max Age The Hello Time is the time between each BPDU that is sent. This time is equal to two seconds by default, but you can tune the time to be between 1 and 10 seconds. While the Hello Time received in the Configuration BPDU from the root bridge is propagated unchanged throughout the Spanning Tree domain, all switches have their own local Hello Time for TCN BPDUs that the switches transmit. The IEEE 802.1D standard specifies a default Hello Time value of two seconds based on a recommended Spanning Tree diameter of seven switches. The Forward Delay is the time that is spent in the Listening and Learning states. When the port transitions to the Listening state, it indicates a change in the current Spanning Tree topology and 142 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 that port will go from a Blocking state to a Forwarding state. The Forward Delay is used to cover the period between the Blocking and Forwarding states, which includes the Listening and Learning states. This time is set to 15 seconds (sec) by default, but can be set manually to be between 4 and 30 seconds. As is the case with the Hello Time, the default Forward Delay value is based on the IEEE Spanning Tree diameter of seven switches. The Max Age time is set in the BPDU by the root bridge and defaults to 20 seconds. This timer can be set manually to any number between 6 and 40 seconds. The Max Age value remains the same for all BPDUs that are propagated by all switches in the Spanning Tree domain. Any changes to this value on the root bridge are propagated to the other switches in the Spanning Tree domain. The default Max Age is based on the IEEE Spanning Tree diameter of seven switches. As was previously stated, Configuration BPDUs are sent only by the root bridge. In the event that the root bridge fails or is removed from the network, Configuration BPDUs will no longer be sent and the STP network is considered broken. Only when the Max Age timer expires will another switch be elected root bridge, effectively restoring the flow of Configuration BPDUs and the STP network. Reducing the values of these timers can be used to reduce the time it takes for the switched network to converge. However, in doing so, it may also cause additional issues in the network, especially in unstable topologies. In such topologies, STP timers should instead be increased until the issue (e.g., flapping trunk links, etc.) can be resolved. TROUBLESHOOTING SPANNING TREE PROTOCOL As previously stated, troubleshooting STP can be a very complex task. Because STP loops can bring down an entire network, it is important to understand exactly what is happening and quickly implement an appropriate solution, as in most cases, you might have no more than a few minutes. There are two primary reasons for STP protocol failure or issues: configuration errors and the loss of STP BDPUs. Additional causes also include MAC address table corruption, hardware and software issues or bugs, and switch resource utilization issues. Designing and Implemen ng a Sound STP Network Several elements that should be taken into consideration when designing a solid STP network include the following: • Determine the location of the root bridge • Design a deterministic topology • Integrate Layer 3 switching • Avoid end-to-end VLAN solutions 143 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The root bridge is the crux of an STP network. The root is responsible for generating STP Configuration BPDUs, which are propagated throughout the domain to other downstream switches. The root bridge sends out Hello packets every Hello interval, which defaults to two seconds. If the root bridge fails, then Configuration BPDUs will no longer be sent. In this case, switches will wait for the expiration of the Max Age time before invaliding stored BPDUs and beginning the process of electing a new root bridge. Network connectivity is restored after the new root bridge has been elected. There is no explicit configuration required to facilitate root bridge election when running STP. However, it is recommended practice to select explicitly a device that should be elected root bridge and ensure that this is well documented so that this information is available when troubleshooting any STP issues. In addition, allowing STP to elect a root bridge could result in poor performance for some network segments as illustrated in Figure 3-4 below: Network Users Network Resources 0000.0000.0001 0000.0000.0002 0000.0000.0003 0000.0000.0004 Network Users Network Users Fig. 3-4. Suboptimal Forwarding Due to Root Bridge Placement Referencing Figure 3-4, assuming the switches in the network all have the same priority value, which would be 32,768 by default, STP would use the lowest MAC address to elect the root bridge. Following this logic, the switch with MAC address 0000.0000.0001 would be elected as the root. The result is that user traffic from the other switches in the network will have to traverse that switch to reach network resources, such as servers, as illustrated by the arrows in Figure 3-4. Depending on the capabilities of this switch (e.g., bandwidth and processing capabilities), there would most likely be performance issues, such as slow response, on the network for users connected to the other switches accessing network resources. In this network, the recommended solution would be to configure the switch with MAC address 0000.0000.0002 as the root bridge. This provides an optimum path for network user traffic that is destined to any network resources (e.g., network servers and printers) as illustrated by the arrows in Figure 3-5 below: 144 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Network Users Network Resources 0000.0000.0001 0000.0000.0002 0000.0000.0003 0000.0000.0004 Network Users Network Users Fig. 3-5. Optimal Forwarding Due to Root Bridge Placement There are two supported methods of influencing root bridge election. The first method entails specifying the Bridge Priority manually using the spanning-tree vlan [number] priority [value] global configuration command. Valid priority values can be configured beginning at 0 and incrementing in value by 4,096 thereafter (e.g., 0, 4,096, 8,192…61,440). The second method entails using a built-in macro available in Cisco IOS software. This macro is invoked using the spanning-tree root [vlan] [primary|secondary] global configuration command. When this command is executed using the [primary] keyword, Cisco IOS software checks the switch priority of the current root switch for the specified VLAN. Because of the extended system ID support, the switch sets the switch priority for the specified VLAN to 24,576 if this value will cause this switch to become the root for the specified VLAN. However, if the root switch for the specified VLAN has a priority lower than 24,576, then the switch sets its own priority for the specified VLAN to 4,096 less than the lowest switch priority. This continues until the switch has a lower priority than the current root and is itself elected the root. When using the macro, the switch will become the root bridge only for the specified VLAN. It is also important to remember that this command runs only once. In other words, if after executing this command the switch becomes the root bridge for that VLAN, should another switch be configured with a priority value lower than that selected by the macro, then that other switch will become the root bridge instead. The macro will need to be reissued in order to influence the root bridge election process again. For this reason, it is recommended that you specify the root bridge manually using the manual bridge priority configuration method. 145 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In addition to determining the location of the root bridge, it is also important to ensure that the traffic flows in the LAN are deterministic, meaning that they are not determined randomly. This entails knowing which ports should be placed into a Forwarding state and which ports will be in a Blocking state when the root bridge is active, and even when the secondary (backup) root bridge is active, if applicable. As stated in Chapter 1, traffic flows should be included in the network documentation. Understanding the path traffic should take during normal and backup scenarios simplifies the troubleshooting process. Consideration should also be given to the number of redundant links included in the topology. While redundancy allows for high availability, too much redundancy can cause you more problems than it resolves. Remember, all it takes for a Spanning Tree loop to be created, which typically results in a network meltdown, is for a single blocking port to transition mistakenly into the Forwarding state. The greater the number of redundant network links, the greater the number of ports that STP will place into the Blocking state and, ultimately, the higher the chances of a Spanning Tree loop developing in the network. Avoid over-redundant solutions. Layer 3 switching solutions, such as Cisco Express Forwarding (CEF), allow for the routing of traffic at switching speeds. The different switching mechanisms supported in Cisco IOS software are described in additional detail in the following chapter. Routing allows for inter-VLAN communication. In addition, routers break up Broadcast and collision domains. Layer 3 switching provides the following advantages over Layer 3 routing: • Hardware-based packet forwarding • High-performance packet switching • High-speed scalability • Low latency • Lower per-port cost • Flow accounting • Security • Quality of Service (QoS) When designing and implementing switched LAN solutions, consider a design that uses local VLANs over end-to-end VLANs. End-to-end VLANs are VLANs that span the entire switch fabric of a network. These VLANs are also commonly referred to as campus-wide VLANs, as they sometimes span the entire campus LAN so that network hosts and their servers remain in the same VLAN (logically), even though the devices may physically reside in different buildings, for example. End-to-end VLAN implementation is based on the 80/20 rule and therefore requires that each VLAN exist at the Access layer in every switch block. The primary reason for end-to-end VLAN 146 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 implementation is to support maximum flexibility and the mobility of end devices. These solutions have the following characteristics: • They allow the grouping of users into a single VLAN independent of physical location • They are difficult to implement and troubleshoot • Each VLAN provides common security and resource requirements for members • They become extremely complex to maintain as the campus network grows Unlike end-to-end VLANs, local VLANs are based on geographic locations by demarcation at a hierarchical boundary. These VLANs are designed for modern-day networks that adhere to the 80/20 rule, where end users typically require greater access to resources outside of their local VLAN. With local VLANs, up to 80% of the traffic is destined to the Internet or other remote network locations, while no more than 20% of the traffic remains local. Despite the name, local VLANs are not restricted to a single switch and can range in size from a single switch in a wiring closet to an entire building. This VLAN implementation method provides maximum availability by using multiple paths to destinations, maximum scalability by keeping the VLAN within a switch block, and maximum manageability. Common Causes for Bridging Loops While a solid design should minimize STP failures, it cannot guarantee that problems or issues will not arise within the network. When STP fails, a bridging loop often ensues. A loop originates when a port that should be blocking transitions to the Forwarding state. Given that there is no concept similar to the Layer 3 TTL at Layer 2, when this happens, frames can traverse the network endlessly, consuming switch resources and possibly leading to a complete network meltdown. There are multiple reasons a bridging loop can occur. Common reasons include, but are not limited to, the following: • Physical layer connectivity issues • Switch misconfigurations • Switch resource utilization issues • Broadcast storms • Hardware or software errors A loss of connectivity between switches results in a loss of BPDUs. If a switch does not receive BPDUs on a port on which they should be received, it transitions other ports into the Forwarding state, which can result in a Spanning Tree loop if the root port link is not actually down. Such problems are typically caused by duplex mismatches for Ethernet trunk links and unidirectional links when using Fiber Optic trunk links. 147 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Duplex mismatches are applicable only when using 10/100 links. As stated previously, Catalyst switches support only full duplex for 1Gbps links. Given that this is the standard implemented in most modern networks, duplex mismatches are typically a non-issue. However, because 10/100 links can be used for trunk connectivity, it is still worth keeping this in mind in the event that you are using 10/100 Ethernet links for trunking between switches in your network. Figure 3-6 below illustrates how a simple duplex mismatch could cause a bridging loop: Root Bridge F F Switch 2 F F F F B F B F Switch 3 Switch 4 Fig. 3-6. Bridging Loop Formation Due to Duplex Mismatches Figure 3-6 illustrates a basic switched network. The relevant port states (i.e., Forwarding and Blocking) are illustrated based on the location of the root bridge. Assume, for example, that auto negotiation fails between the root bridge and Switch 4, resulting in the root bridge defaulting to half-duplex operation, while Switch 4 defaults to full-duplex operation for this link. Half-duplex Ethernet uses a carrier sensing mechanism (CSMA/CD) on the physical medium to determine when gaps between frame transmissions occur. Stations may begin transmitting any time they detect that the network is quiet. When a collision occurs, a jam signal is sent out and the devices involved in the collision stop transmitting for a short period of time. A random back-off algorithm is then executed to ensure that the devices are transmitting at random times so that they avoid another collision or collisions. Given this, if Switch 4 is transmitting enough traffic, it is possible that the current root bridge (half duplex) might defer all packets, including BPDUs, until it senses that the medium is available. Should this happen, Switch 4 might not receive BPDUs on its root port within the 20-second timeframe and from an STP point of view; assume that it has lost its connection to the root bridge and thus bring up the backup port currently in the Blocking state. Once this port is unblocked, a bridging loop is created. Unlike Ethernet, Fiber Optic links do not use CSMA/CD. However, transceiver issues can cause unidirectional communication issues between two switches, which can result in the same problem described with half-duplex Ethernet connection. When using Fiber Optic for trunk links between 148 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 switches, it is recommended that UniDirectional Link Detection (UDLD) be enabled. UDLD can detect improper cabling or unidirectional links at Layer 2 and can break resulting loops automatically by disabling some ports. By default, when UDLD is enabled on Cisco Catalyst switches, it will only be enabled for Fiber Optic links. However, UDLD can also be enabled manually for Ethernet links. Another common configuration error that results in STP protocol failure or bridging loops is the configuration of aggressive timers. As stated earlier in this chapter, the default STP timers are based on an STP network diameter of seven bridges or switches. The default timer values are listed and described below in Table 3-3 below: Table 3-3. Default Spanning Tree Timers Timer Hello Time Max Age Forward Delay Default 2 seconds 20 seconds 15 seconds Description Specifies the time between sending Spanning Tree BPDUs Specifies the maximum time to save BPDU information Specifies the time spent in the Listening and Learning states Cisco IOS software provides two methods of changing the default STP timers, which may be used to decrease the convergence time of the switched network. The first method entails specifying the individual timers manually, on a per-VLAN basis, using the spanning-tree vlan [vlan] [forward-time | hello-time | max-age] <interval> global configuration command on the root bridge. The second option includes using a macro that can be enabled using the spanning-tree vlan [vlan] root [primary | secondary] diameter <2-7> global configuration command on either the root or backup root bridge. When the macro is used, Cisco IOS software will automatically set the most optimum STP timers for the specified network diameter. While aggressive timers do allow for faster convergence, they can also make troubleshooting the network much more difficult, especially during periods of instability where the topology constantly fluctuates. For this reason, do not arbitrarily adjust STP timers without valid justification or a recommendation from the TAC. Instead, consider alternatives, such as RSTP, that do speed up convergence without having to manipulate timers. Finally, another common configuration error that results in loops is PortFast configuration. PortFast is a Cisco STP enhancement that, when enabled, allows the specified port to skip the first stages of the STA and directly transition to the Forwarding state. PortFast can be enabled on both trunk and access links. A common reason for configuring or enabling PortFast on an access port is if a user complains that he or she is unable to get an IP address from the DHCP server, which is most likely due to a timeout because of the amount of time it takes for an STP port to transition to the Forwarding state. In such cases, PortFast can be used to resolve this issue. 149 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I A common misconception regarding PortFast is that because the trunk keyword can be specified in conjunction with the spanning-tree portfast interface configuration command, enabling PortFast on a trunk link will also decrease convergence. This is not true because PortFast does not disable Spanning Tree on the selected port, and this can result in bridging loops. The spanningtree portfast trunk interface configuration command should be used only on trunk links that are connected to devices, such as servers or firewalls, that do not send BPDUs but do support multiple VLANs. Another common reason for bridging loops is high switch resource utilization. This includes primarily CPU and memory consumption, but it could also be due to congestion. High CPU utilization may be due to a plethora of reasons. Additionally, the factors that contribute to this will vary depending on the switch platform. While Catalyst switches always service control packets (e.g., BPDUs) before normal packets (e.g., ICMP packets), overutilization of the CPU can result in the switch running out of resources to transit control packets. Again, this can lead to a bridging loop. There are several Cisco IOS commands that can be used to check system health and troubleshoot high resource utilization issues, with the most common command being the show processes command. Additional performance troubleshooting commands are described in detail in the following chapter. A Broadcast storm means that your network is overwhelmed with a constant flow of Broadcast or Multicast traffic. Broadcast storms can eventually lead to a complete loss of network connectivity as the packets proliferate. While uncommon in the high-bandwidth LANs of today, it is quite possible for one or more devices (e.g., a powerful server) to bring down the network because of Broadcasts. A high volume of Broadcast traffic consumes bandwidth and can result in packet loss, including the loss of BPDUs, leading to a bridging loop. A more common cause for Broadcast storms in modern-day networks is Multicast. IP Multicast uses packets that include IP Options, such as Internet Group Messaging Protocol (IGMP) packets used in Multicast implementations. These exception packets are punted to the CPU of the switch for processing. A large number of these packets can increase CPU utilization, depleting the resources the switch has to process and transmit control packets, such as BPDUs. Broadcast storms can be mitigated by enabling the traffic suppression feature under applicable interfaces using the storm-control [broadcast | multicast | unicast] level <percent | pps> interface configuration command. In addition, high-end switches, such as the Catalyst 6500 Series switch, also support the rate-limiting (throttling) of packets that are punted to the CPU, allowing for additional protection against such events. These options are described in additional detail in the following chapter. 150 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Finally, both hardware and software errors can also result in bridging loops. For example, faulty hardware, such as a bad port, may result in data (packet) corruption or a high rate of errors on the link, which may cause BPDUs to be lost, ultimately resulting in ports that should be blocking transitioning to a Forwarding state and causing a loop. Although uncommon, software bugs can also cause STP protocol failures, which result in bridging loops. Because a bug is difficult to identify, you should contact the TAC if you suspect a software bug. Troubleshoo ng Spanning Tree In the previous section, we discussed some of the reasons that can cause STP to fail and result in the development of bridging loops in the network. Unlike routing loops, wherein packets are discarded after the TTL decrements to 0, bridging loops can result in frames traversing the network infinitely. As this happens, more and more switch resources are consumed, resulting in the switch finally ‘locking up.’ In such situations, in-band access, such as Telnet and SSH, to the switch is typically lost and only out-of-band management via the console is possible. For this reason, as was stated earlier in this chapter, it is important to ensure that the network is well designed and documented. This allows you to identify the following: • The overall topology of the network • The location of the root bridge • Redundant links and blocking ports This information allows you to troubleshoot the bridging loop effectively. Because a number of problems could cause a bridging loop, there is no single method that can be used to troubleshoot such issues. However, by understanding the common causes of bridging loops, you can use a process of elimination to isolate, and ultimately resolve, the issue. As previously stated, a bridging loop will most likely result in you losing in-band access to the switch, meaning that only out-of-band access and management via the console will be possible. If you know the location of the root bridge, during periods of extended instability, the first thing you should do to reduce STP BPDU traffic is to increase STP timers (i.e., the Forward Delay and Max Age) to the maximum possible values. This allows you to stabilize the network somewhat and continue troubleshooting the bridging loop. After this, you can perform the following activities to identify and isolate the issue: • Check port utilization statistics • Check port BPDU statistics • Check for duplex mismatches • Check port errors • Check resource utilization 151 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The best way to identify a bridging loop is to capture the traffic on a saturated link and check that you see similar packets multiple times. An interface with traffic overload can fail to transmit vital BPDUs. A link overload also indicates a possible bridging loop. You can use the show interfaces command to verify link utilization statistics per port. In addition to verifying utilization statistics, you should also verify that the ports that should be receiving BPDUs are receiving them. BPDUs should be received on root ports and on blocked ports (those on blocked ports are inferior). You can use the show spanning-tree interface [name] detail command to verify BPDU statistics on a per-interface basis. You can also use the show spanning-tree vlan [number] detail command to view interface statistics on a per-VLAN basis. When using these commands, you should see more received BPDUs on a root port that is in the Forwarding state than BPDUs sent because BPDUs are propagated downstream from the root bridge throughout the STP domain. Because all ports send BPDUs when they initialize, you should see a few BPDUs sent out of the root port. Following is the output of the show spanning-tree interface [name] detail command for a port elected as the root port: Cat-3550-1#show spanning-tree interface FastEthernet0/11 detail Port 11 (FastEthernet0/11) of VLAN0001 is root forwarding Port path cost 19, Port priority 128, Port Identifier 128.11. Designated root has priority 32769, address 000b.fd67.6500 Designated bridge has priority 32769, address 000b.fd67.6500 Designated port id is 128.11, designated path cost 0 Timers: message age 2, forward delay 0, hold 0 Number of transitions to forwarding state: 1 Link type is point-to-point by default BPDU: sent 6, received 69600 Notice the significant difference between the number of sent and received BPDUs. This is what you should expect to see during normal STP operation. Blocking ports should also see a large number of BPDUs received, which is normal. While you will see some sent BPDUs, which is also normal, this number should not be incrementing during normal STP operation. Additionally, you can look at the number of transitions to the Forwarding state to troubleshoot instability issues. Following is the output of the show spanning-tree interface [name] detail command for a port that has been placed into the Blocking state: Cat-3550-1#show spanning-tree interface FastEthernet0/12 detail Port 12 (FastEthernet0/12) of VLAN0001 is alternate blocking Port path cost 19, Port priority 128, Port Identifier 128.12. Designated root has priority 32778, address 000b.fd67.6500 Designated bridge has priority 32778, address 000b.fd67.6500 Designated port id is 128.12, designated path cost 0 152 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Timers: message age 2, forward delay 0, hold 0 Number of transitions to forwarding state: 0 Link type is point-to-point by default BPDU: sent 3, received 42804 Finally, on the root bridge itself, you should expect to see more BPDUs transmitted than are received on all ports. This indicates normal STP operation. Following is the output of the show spanning-tree vlan [number] detail command on the root of the specified VLAN: Cat-3550-2#show spanning-tree vlan 10 detail VLAN0010 is executing the ieee compatible Spanning Tree protocol Bridge Identifier has priority 32768, sysid 10, address 000b.fd67.6500 Configured hello time 2, max age 20, forward delay 15 We are the root of the spanning tree Topology change flag not set, detected flag not set Number of topology changes 4 last change occurred 23:27:07 ago Times: hold 1, topology change 35, notification 2 hello 2, max age 20, forward delay 15 Timers: hello 1, topology change 0, notification 0, aging 300 Port 11 (FastEthernet0/11) of VLAN0010 is designated forwarding Port path cost 19, Port priority 128, Port Identifier 128.11. Designated root has priority 32778, address 000b.fd67.6500 Designated bridge has priority 32778, address 000b.fd67.6500 Designated port id is 128.11, designated path cost 0 Timers: message age 0, forward delay 0, hold 0 Number of transitions to forwarding state: 1 Link type is point-to-point by default BPDU: sent 43024, received 3 Port 12 (FastEthernet0/12) of VLAN0010 is designated forwarding Port path cost 19, Port priority 128, Port Identifier 128.12. Designated root has priority 32778, address 000b.fd67.6500 Designated bridge has priority 32778, address 000b.fd67.6500 Designated port id is 128.12, designated path cost 0 Timers: message age 0, forward delay 0, hold 0 Number of transitions to forwarding state: 1 Link type is point-to-point by default BPDU: sent 43025, received 3 In addition to using show commands, you can also enable BPDU debugging using the debug spanning-tree bpdu command to verify BPDU statistics. However, keep in mind that the switch CPU will most likely already be very high, and enabling debugging might just lock up the switch. The following illustrates a sample of the output that is printed by this command: Cat-3550-1#debug spanning-tree bpdu *Mar 2 15:12:27.770: STP: Data 153 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 00000000008028000BFD676500000000008028000BFD676500800B0000140002000F00 *Mar 2 15:12:27.774: STP: VLAN0040 Fa0/11:0000 00 00 00 8028000BFD676500 00000000 8028000BFD676500 800B 0000 1400 0200 0F00 *Mar 2 15:12:27.774: STP(40) port Fa0/11 supersedes 0 *Mar 2 15:12:27.774: STP: VLAN0040 rx BPDU: config protocol = ieee, packet from FastEthernet0/12 , linktype IEEE_SPANNING , enctype 2, encsize 17 *Mar 2 15:12:27.774: STP: enc 01 80 C2 00 00 00 00 0B FD 67 65 0C 00 26 42 42 03 *Mar 2 15:12:27.774: STP: Data 00000000008028000BFD676500000000008028000BFD676500800C0000140002000F00 *Mar 2 15:12:27.774: STP: VLAN0040 Fa0/12:0000 00 00 00 8028000BFD676500 00000000 8028000BFD676500 800C 0000 1400 0200 0F00 *Mar 2 15:12:27.778: STP(40) port Fa0/12 supersedes 0 *Mar 2 15:12:27.778: STP: VLAN0020 rx BPDU: config protocol = ieee, packet from FastEthernet0/11 , linktype IEEE_SPANNING , enctype 2, encsize 17 ... [Truncated Output] As previously stated, duplex mismatches can result in both VLAN connectivity as well as STP protocol failures. When the switch notices duplex mismatches, it will print the following: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on FastEthernet0/1 (not full duplex), with R2 FastEthernet0/0 (full duplex) Interface or port errors could also be used to identify the root cause of the bridging loop. Using the show interfaces command, check for incrementing errors, which could indicate packet corruption. Depending on the switch platform, you could also use additional commands, such as the show controllers ethernet-controller [interface] command or the show interfaces counters errors [interface] commands, to view error statistics. Finally, also include resource utilization checks in your troubleshooting. As was previously stated, high CPU utilization could be caused by a number of things. If possible, attempt to determine which process is causing the high CPU utilization using the show processes cpu command. Because this command prints a great deal of information, you should filter the output to increase the efficiency of your troubleshooting process. For example, you could filter the output of this command so that all processes using no CPU resources are omitted as follows: Cat-3550-1#show processes cpu | exclude 0.00 CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 51 581000 423131 1373 0.39% 0.39% 0.39% 0 Vegas Statistics 71 708 730 969 0.15% 0.04% 0.01% 0 Exec 154 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 USING THE ‘SHOW SPANNINGͳTREE’ COMMAND In the final section of this chapter, we will explore the Cisco IOS show spanning-tree command, as well as the keywords that can be used in conjunction with this command. We will also discuss the relevant output that is printed by this command and how to interpret this information when validating or troubleshooting Spanning Tree Protocol. The following output displays the keywords that can be used in conjunction with this command: Cat-3550-1#show spanning-tree ? WORD bridge group list, example 1,3-5,7,9 active Report on active interfaces only backbonefast Show spanning tree backbonefast status blockedports Show blocked ports bridge Status and configuration of this bridge detail Detailed information inconsistentports Show inconsistent ports interface Spanning Tree interface status and configuration mst Multiple spanning trees pathcost Show Spanning pathcost options root Status and configuration of the root bridge summary Summary of port states uplinkfast Show spanning tree uplinkfast status vlan VLAN Switch Spanning Trees | Output modifiers <cr> NOTE: Only keywords that are applicable to the TSHOOT certification exam will be discussed. The active keyword prints information on active VLANs in the STP domain. This includes the STP timers, priority of the root bridge, active interfaces or ports, and their states. The output of this command is illustrated below: Cat-3550-1#show spanning-tree active VLAN0001 Spanning tree enabled protocol ieee Root ID Priority 1 Address 000f.2303.2d80 This bridge is the root Hello Time 2 sec Max Age 20 sec Bridge ID Forward Delay 15 sec Priority 1 (priority 0 sys-id-ext 1) Address 000f.2303.2d80 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Aging Time 300 Interface Role Sts Cost Prio.Nbr Type ------------------- ---- --- --------- -------- ---------------------------Fa0/11 Desg FWD 19 128.11 P2p 155 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Fa0/12 Desg FWD 19 128.12 VLAN0010 Spanning tree enabled protocol ieee Root ID Priority 32778 Address 000b.fd67.6500 Cost 19 Port 11 (FastEthernet0/11) Hello Time 2 sec Max Age 20 sec Bridge ID P2p Forward Delay 15 sec Priority 32778 (priority 32768 sys-id-ext 10) Address 000f.2303.2d80 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Aging Time 300 Interface ------------------Fa0/11 Fa0/12 Role ---Root Altn Sts --FWD BLK Cost --------19 19 Prio.Nbr -------128.11 128.12 Type --------------------------P2p P2p The blockedports keyword prints blocked ports on a per-VLAN basis. If you know which ports should be in a Blocking state, then you can use this command to verify that they are indeed blocking. The following output illustrates the information that is printed by this command: Cat-3550-1#show spanning-tree blockedports Name -------------------VLAN0010 VLAN0020 VLAN0030 VLAN0040 Blocked Interfaces List -----------------------------------Fa0/12 Fa0/12 Fa0/12 Fa0/12 Number of blocked ports (segments) in the system : 4 The detail keyword prints detailed STP information on a per-VLAN basis. This information includes information on STP timers and port states, among other information. Following is a sample output of the information that is printed by this command: Cat-3550-1#show spanning-tree detail VLAN0001 is executing the ieee compatible Spanning Tree Protocol Bridge Identifier has priority 0, sysid 1, address 000f.2303.2d80 Configured hello time 2, max age 20, forward delay 15 We are the root of the spanning tree Topology change flag not set, detected flag not set Number of topology changes 5 last change occurred 01:03:37 ago 156 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 Times: hold 1, topology change 35, notification 2 hello 2, max age 20, forward delay 15 Timers: hello 1, topology change 0, notification 0, aging 300 Port 11 (FastEthernet0/11) of VLAN0001 is designated forwarding Port path cost 19, Port priority 128, Port Identifier 128.11. Designated root has priority 1, address 000f.2303.2d80 Designated bridge has priority 1, address 000f.2303.2d80 Designated port id is 128.11, designated path cost 0 Timers: message age 0, forward delay 0, hold 0 Number of transitions to forwarding state: 1 Link type is point-to-point by default BPDU: sent 1916, received 69809 Port 12 (FastEthernet0/12) of VLAN0001 is designated forwarding Port path cost 19, Port priority 128, Port Identifier 128.12. Designated root has priority 1, address 000f.2303.2d80 Designated bridge has priority 1, address 000f.2303.2d80 Designated port id is 128.12, designated path cost 0 Timers: message age 0, forward delay 0, hold 0 Number of transitions to forwarding state: 1 Link type is point-to-point by default BPDU: sent 1914, received 69809 ... [Truncated Output] The inconsistentports keyword prints information about ports that have an inconsistent STP state. Cisco IOS software places ports into an inconsistent state so that any misconfigurations do not impact STP and result in a protocol failure, which can ultimately lead to a network outage. Valid reasons for which a port will be placed into an inconsistent state include the following: • Loop inconsistency • Port VLAN ID (PVID) inconsistency • Root inconsistency • EtherChannel inconsistency • Type inconsistency The port is placed into the loop inconsistent state if Loop Guard detects that a non-designated port has stopped receiving BPDUs. This is used to prevent the port from transitioning from nondesignated (Blocking) to designated (Forwarding) in the absence of BPDUs. A switch port will be placed into a PVID inconsistent state if a PVST+ BPDU is received on a different VLAN than the BPDU was originated. This happens when native VLANs are mismatched. 157 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The Root Guard feature prevents a designated port from becoming a root port. If the port on which the Root Guard feature receives a superior BPDU, then it moves the port into a root inconsistent state, thus maintaining the current root bridge status quo. The EtherChannel inconsistent state is used to prevent loops in the event of any EtherChannel misconfigurations. EtherChannel Guard is enabled on switches using the spanning-tree etherchannel guard misconfig global configuration command. Finally, a port is placed into a type inconsistent state if a PVST+ BPDU is received on a non-802.1Q trunk. The output of the show spanning-tree inconsistentports command following an error message indicating a native VLAN mismatch on the switch is illustrated below: *Mar 2 16:33:22.709: %SPANTREE-7-RECV_1Q_NON_TRUNK: Received 802.1Q BPDU on non trunk FastEthernet0/11 VLAN1. *Mar 2 16:33:22.709: %SPANTREE-7-BLOCK_PORT_TYPE: Blocking FastEthernet0/11 on VLAN0001. Inconsistent port type. Cat-3550-2# Cat-3550-2#show spanning-tree inconsistentports Name Interface Inconsistency -------------------- ------------------------ -----------------VLAN0001 FastEthernet0/11 Port Type Inconsistent Number of inconsistent ports (segments) in the system : 1 The interface keyword prints STP information on a per-interface basis, which includes the port role and state, port cost, and priority, as well as the port type. You can append the detail keyword to view additional information, such as the root bridge ID and BPDU statistics. The following output displays the information printed by this command: Cat-3550-1#show spanning-tree interface FastEthernet0/11 Vlan ------------------VLAN0001 VLAN0010 VLAN0020 VLAN0030 VLAN0040 Role ---Desg Root Root Root Root Sts --FWD FWD FWD FWD FWD Cost --------19 19 19 19 19 Prio.Nbr -------128.11 128.11 128.11 128.11 128.11 Type --------------------------P2p P2p P2p P2p P2p The root keyword prints information about the root bridge for all active VLANs. This includes the root bridge ID, root priority, root port cost, timers, and the root port as illustrated below: Cat-3550-1#show spanning-tree root Root Hello Max Fwd Vlan Root ID Cost Time Age Dly ---------------- -------------------- --------- ----- --- --VLAN0001 1 000f.2303.2d80 0 2 20 15 VLAN0010 32778 000b.fd67.6500 19 2 20 15 158 Root Port -----------Fa0/11 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 VLAN0020 VLAN0030 VLAN0040 32788 000b.fd67.6500 32798 000b.fd67.6500 32808 000b.fd67.6500 19 19 19 2 2 2 20 20 20 15 15 15 Fa0/11 Fa0/11 Fa0/11 The summary keyword prints summary information for all VLANs, including STP mode, STP enhancements (e.g., PortFast and Loop Guard configuration), the VLANs for which the local switch is root bridge, and the state of different ports in different VLANs. Following is the output of the show spanning-tree summary command: Cat-3550-2#show spanning-tree summary Switch is in pvst mode Root bridge for: VLAN0010, VLAN0020, VLAN0030, VLAN0040 Extended system ID is enabled Portfast Default is disabled PortFast BPDU Guard Default is disabled Portfast BPDU Filter Default is disabled Loopguard Default is disabled EtherChannel misconfig guard is enabled UplinkFast is disabled BackboneFast is disabled Configured Pathcost method used is short Name Blocking Listening Learning Forwarding STP Active ---------------------- -------- --------- -------- ---------- ---------VLAN0001 1 0 0 1 2 VLAN0010 0 0 0 2 2 VLAN0020 0 0 0 2 2 VLAN0030 0 0 0 2 2 VLAN0040 0 0 0 2 2 ---------------------- -------- --------- -------- ---------- ---------5 vlans 1 0 0 9 10 Finally, the vlan keyword prints detailed information on a per-VLAN basis. This is one of the most commonly used STP commands. The information printed by this command includes STP timers set on the root bridge, local STP timers, and information on the root bridge, among other things. Following is the output of this command on a non-root bridge: Cat-3550-1#show spanning-tree vlan 10 VLAN0010 Spanning tree enabled protocol ieee Root ID Priority 32778 Address 000b.fd67.6500 Cost 19 Port 11 (FastEthernet0/11) Hello Time 2 sec Max Age 20 sec Bridge ID Priority 32778 Forward Delay 15 sec (priority 32768 sys-id-ext 10) 159 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Address 000f.2303.2d80 Hello Time 2 sec Max Age 20 sec Aging Time 300 Interface ------------------Fa0/11 Fa0/12 Role ---Root Altn Sts --FWD BLK Cost --------19 19 Prio.Nbr -------128.11 128.12 Forward Delay 15 sec Type --------------------------P2p P2p Following is the output of the same command on the root bridge of the specified VLAN: Cat-3550-2#show spanning-tree vlan 10 VLAN0010 Spanning tree enabled protocol ieee Root ID Priority 32778 Address 000b.fd67.6500 This bridge is the root Hello Time 2 sec Max Age 20 sec Bridge ID Forward Delay 15 sec Priority 32778 (priority 32768 sys-id-ext 10) Address 000b.fd67.6500 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Aging Time 300 Interface ------------------Fa0/11 Fa0/12 Role ---Desg Desg Sts --FWD FWD Cost --------19 19 Prio.Nbr -------128.11 128.12 Type --------------------------P2p P2p CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Troubleshoo ng at the Physical Layer • If you have physical access to the switch, the LEDs can be a very useful troubleshooting tool • Cisco switches have front panel LEDs that can be used to determine link and system status • A link or port LED color other than green typically indicates some kind of failure • However, a green link light does not always mean that the network cable is fully functional • The show interfaces command is a powerful troubleshooting tool that provides the following: 1. The administrative status of a switching port 2. The port operational state 3. The media type (for select switches and ports) 4. Port input and output packets 5. Port buffer failures and port errors 160 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 6. Port input and output errors 7. Port input and output queue drops VLAN, VTP, and Trunking Overview • A VLAN is a logical grouping of hosts that appear to reside on the same physical LAN • VLANs increase Broadcast domains but reduce their size • A VLAN can span a single or multiple switches, depending on the implementation • Catalyst switches support two types of switch VLAN ports: access and trunk ports • Access ports are switch ports that are assigned to a single VLAN • Frames sent across access ports are untagged • Trunk links or ports are used to carry multiple VLANs • Trunk links can use ISL or 802.1Q encapsulation • Frames sent across trunk links are tagged or colored to identify the VLAN they belong to • VTP manages the addition, deletion, and renaming of VLANs • VTP allows VLAN information to propagate through the switched network • A switch can only belong to one VTP domain at any one time • There are three VTP modes for switches: server (default), client, and transparent Troubleshoo ng VLANs • Some of the more common causes of intra-VLAN connectivity issues include the following: 1. Duplex Mismatches 2. Bad NIC or Cable 3. Congestion 4. Hardware Issues 5. Software Issues 6. Resource Oversubscription 7. Configuration Issues • Some common causes for switches not receiving VLAN information include the following: 1. Layer 2 Trunking Misconfigurations 2. Incorrect VTP Configuration 3. Configuration Revision Number 4. Physical Layer Issues 5. Software or Hardware Issues or Bugs 6. Switch Performance Issues • Possible reasons for a loss of end-to-end connectivity within a VLAN include the following: 1. Physical Layer Issues 161 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 2. VTP Pruning 3. VLAN Trunk Filtering 4. New Switches 5. Switch Performance Issues 6. Network Congestion 7. Software or Hardware Spanning Tree Protocol Overview • The Spanning Tree Protocol (STP) is defined in the IEEE 802.1D standard • STP attempts to provide a loop free topology in redundant networks • STP operates by making the following assumptions about the network: 1. All links are bidirectional and can both send and receive BPDUs 2. The switch is able to regularly receive, process and send BPDUs • The topology of an active switched network is determined by the following three variables: 1. The unique MAC address (switch identifier) that is associated with each switch 2. The path cost to the root bridge associated with each switch port 3. The port identifier (MAC address of the port) associated with each switch port • Switches determine the best Configuration BPDU based on the following four factors: 1. Lowest root bridge ID 2. Lowest root path Cost to root bridge 3. Lowest Sender Bridge ID 4. Lowest Sender Port ID • The completion of the Configuration BPDU exchange results in the following actions: 1. A Root Switch is elected for the entire Spanning Tree domain 2. A root port is elected on every Non-Root Switch in the Spanning Tree domain 3. A Designated Switch is elected for every LAN segment 4. A Designated Port is elected on the Designated Switch for every segment 5. Loops in the network are eliminated by blocking redundant paths • STP supports Configuration BPDUs, TCN BDPUs and TCA BPDUs • Configuration BPDUs are sent by the root bridge • TCN BPDUs are sent by any switch • TCA BPDUs are used by the root bridge to acknowledge TCN BPDUs • STP uses port cost and port priority to determine the root port • The port cost affects the entire STP topology 162 C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2 • The port priority is a locally significant parameter • The three configurable Spanning Tree timer values are as follows: 1. The Hello Time 2. The Forward Delay 3. The Max Age Troubleshoo ng Spanning Tree Protocol • Elements that should be taken into consideration when designing a solid STP network are as follows: 1. Determine the Location of the root bridge 2. Design a Deterministic Topology 3. Integrate Layer 3 Switching 4. Avoid End-to-End VLAN Solutions • Common reasons include, but are not limited to, the following: 1. Physical layer connectivity issues 2. Switch misconfigurations 3. Switch resource utilization issues 4. Broadcast storms 5. Hardware or software errors • During periods of extended instability, increase the Max Age and Fwd Delay timers • After this, you can perform the following activities to identify and isolate the issue: 1. Check Port Utilization Statistics 2. Check Port BPDU Statistics 3. Check for Duplex Mismatches 4. Check Port Errors 5. Check Resource Utilization 163 CHAPTER 4 Troubleshoo ng Catalyst Switch Layer 3 Protocols, Supervisor Redundancy, and Performance Issues C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I I n the previous chapter, we discussed troubleshooting Layer 1 and Layer 2 issues on Cisco IOS Catalyst switches. This chapter discusses Catalyst switch Layer 3 troubleshooting, including basic routing operation and functionality on Catalyst switches and First Hop Redundancy Protocols. In addition, this chapter will also describe Catalyst Switch Supervisor Redundancy and performance troubleshooting. The TSHOOT certification exam objectives that are covered in this chapter include the following: • Troubleshoot First Hop Redundancy Protocols • Troubleshoot Switch Virtual Interfaces • Troubleshoot Switch Supervisor Redundancy This chapter will be divided into the following sections: • Catalyst Switch VLAN Interfaces Overview • Catalyst Switch MLS Overview • Troubleshooting Multilayer Switching • Understanding and Troubleshooting HSRP • Understanding and Troubleshooting VRRP • Understanding and Troubleshooting GLBP • Troubleshooting Switch Supervisor Redundancy • Troubleshooting Switch Performance Issues CATALYST SWITCH VLAN INTERFACES OVERVIEW In a switched network, virtual LANs (VLANs) separate devices into different collision domains. Additionally, VLANs are also used to separate devices into different subnets. Devices within a VLAN can communicate with each other without the need for routing, assuming that these devices reside within the same subnet. However, devices in separate VLANs require a routing device in order to communicate with one another. Traditionally, IP routing and any Layer 3 functions, such as LAN default gateway functionality, were implemented primarily on routers. This entailed using multiple physical router interfaces (in different VLANs) to provide gateway and routing functionality between different VLANs, or using a single physical router interface and then creating multiple subinterfaces, each serving as the default gateway for the specified VLAN, thus allowing devices in different VLANs to communicate. However, in modern networks, these functions are performed by Multilayer Switching (MLS), which is described in additional detail later in this chapter. MLS supports the configuration of Switch Virtual Interfaces (SVIs), which represent VLANs and allow the switch to serve as the de- 166 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS fault gateway for the VLAN. Although the SVI represents a VLAN, it is not automatically configured when a Layer 2 VLAN is configured on the switch. Likewise, the switch will not automatically create a VLAN if you configure an SVI. The only exception to this rule is the SVI for VLAN 1, which is the default VLAN. This SVI is automatically created by the software to allow for remote administration of the switch; however, it defaults to an administratively shutdown state and must be brought up manually and configured with Layer 3 addressing. Additionally, the switch should be configured with the correct default gateway, or on Multilayer switches, IP routing should be enabled. A Switch Virtual Interface is a very resilient interface. In order for an SVI to be placed into the up/ up state, the following conditions must be met: • The VLAN exists and is active in the VLAN database of the switch • The VLAN interface is not administratively down • At least one Layer 2 port (access or trunk) exists and has a link up on this VLAN • At least one Layer 2 port (access or trunk) is in the STP Forwarding state In addition to SVIs, which provide default gateway functions for VLANs, Multilayer switches also support IP addressing configuration on physical interfaces. However, it is important to remember that switches do not route non-IP packets between VLANs and routed ports. However, you can forward these non-IP packets using fallback bridging, which is described in detail in the current SWITCH guide that is available online. CATALYST SWITCH MLS OVERVIEW Multilayer Switching (MLS) combines Layer 2, Layer 3, and Layer 4 switching technologies to forward packets at wire speed using hardware. Cisco supports MLS for both Unicast and Multicast traffic flows. In Unicast transmission, a flow is a unidirectional sequence of packets between a source and destination pair that shares the same protocol and Transport Layer information. These flows are based only on Layer 3 address information. In Multicast transmission, a flow is a unidirectional sequence of packets between a Multicast source and the members of a destination Multicast group. Multicast flows are based on the IP address of the source device and the destination IP Multicast group address. In MLS, a Layer 3 switching table, referred to as an MLS cache, is maintained for the Layer 3-switched flows. The MLS cache maintains flow information for all active flows and includes entries for traffic statistics that are updated in tandem with the switching of packets. After the MLS 167 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I cache is created, any packets identified as belonging to an existing flow can be Layer 3-switched based on the cached information. MLS integrates both data plane and control plane functions. These two planes are responsible for the building of routing tables and the actual forwarding of packets. The control plane is where routing information, routing protocol updates, and other control information is stored and exchanged. Using routing protocols, the control plane is responsible for updating the routing table as changes in the network topology occur. The data plane is responsible for the actual forwarding of data. The data plane is typically populated using information derived from the control plane. This plane is used to determine the physical next-hop egress interface for received packets or frames and then forwards the packets or frames using the correct egress interface. MLS is enabled by configuring Cisco Express Forwarding (CEF) on the switch. CEF operates at the data plane and is a topology-driven proprietary switching mechanism that creates a forwarding table that is tied to the routing table (i.e., the control plane). CEF was developed to eliminate the performance penalty experienced due to the first-packet process-switched lookup method used by flow-based switching. CEF eliminates this by allowing the route cache used by the hardware-based Layer 3 routing engine to contain all the necessary information to the Layer 3 switch in hardware before any packets associated with a flow are even received. Information that is conventionally stored in a route cache is stored in two data structures for CEF switching. These data structures provide optimized lookup for efficient packet forwarding and are referred to as the Forwarding Information Base (FIB) and adjacency table, as described in the following section. CEF uses a FIB to make IP destination prefix-based switching decisions. The FIB is conceptually similar to a routing table or an information base. It maintains a mirror image of the forwarding information contained in the IP routing table. In other words, the FIB contains all IP prefixes from the routing table. When routing or topology changes occur in the network, the IP routing table is updated, and those changes are also reflected in the FIB. The FIB maintains next-hop address information based on the information in the IP routing table. Because there is a one-to-one correlation between FIB entries and routing table entries, the FIB contains all known routes and eliminates the need for route cache maintenance that is associated with switching paths, such as fast switching and optimum switching. Additionally, because the FIB lookup table contains all known routes that exist in the routing table, it eliminates route cache maintenance and the fast switching and process switching forwarding scenarios. This allows CEF to switch traffic more efficiently than typical demand caching schemes. 168 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS The adjacency table is created in order to contain all connected next-hops. An adjacent node is a node that is one hop away (i.e., directly connected). The adjacency table is populated as adjacencies are discovered. As soon as a neighbor becomes adjacent, a Data Link Layer header, called a MAC string or a MAC rewrite, which will be used to reach that neighbor, is created and stored in the table. On Ethernet segments, the header information is the destination MAC address, the source MAC address, and the EtherType, in that specific order. As soon as a route is resolved, it points to an adjacent next-hop. If an adjacency is found in the adjacency table, a pointer to the appropriate adjacency is cached in the FIB element. If multiple paths exist for the same destination, then a pointer to each adjacency is added to the load-sharing structure, which allows for load-balancing. When prefixes are added to the FIB, prefixes that require exception handling are cached with special adjacencies. These components, and their interaction, are illustrated in Figure 4-1 below. MSFC - Control Plane IP Routing Table IP ARP Table Protocol Metric Network/Mask Next-Hop IP Address MAC Address Static 192.168.1.1 192.168.1.1 0001.1a2b.cdef 1 172.16.0.0/12 Network/Mask Adjacency IP Address MAC Address 172.16.0.0/12 192.168.1.1 192.168.1.1 0001.1a2b.cdef Forwarding Information Base Adjacency Table PFC - Data Plane Fig. 4-1. Cisco Express Forwarding Operation Enabling CEF requires the use of a single command, which is the ip cef [distributed] global configuration command. The [distributed] keyword is applicable only to high-end switches, such as the Catalyst 6500 Series switch, that support distributed CEF. MSFC and PFC are explained later in this chapter. 169 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I TROUBLESHOOTING MULTILAYER SWITCHING MLS troubleshooting requires troubleshooting at both the control plane and the data plane. The control plane troubleshooting is the same as that performed on routers. MLS troubleshooting should follow a systematic approach, which involves checking the control plane (i.e., routing information), and then verifying the data or forwarding plane information. The following basic steps should be taken when troubleshooting Unicast MLS issues: 1. Verify that IP routing information for the address is correct 2. Verify that the next-hop has a valid MAC address 3. Verify that the FIB next-hop is the same as the RIB next-hop 4. Verify the CEF adjacency table rewrite information 5. Verify FIB and adjacency table population in TCAM The first step is to verify that the destination address is present in the routing table. This step is performed using the show ip route command as follows: Cat-6500-1#show ip route 0.0.0.0 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via “static”, distance 1, metric 0, candidate default path Routing Descriptor Blocks: * 10.10.10.1 Route metric is 0, traffic share count is 1 This first step is used to ensure that there is a route to the intended destination network and the route has a valid next-hop address. If the route does not exist or the next-hop address is incorrect, troubleshooting of routing protocol, next-hop interfaces, or route configuration will be required. The second step is to verify that the next-hop address has a valid next-hop MAC address using the show ip arp command. If the ARP entry for the next-hop address is incomplete, then you will need to troubleshoot the ARP. Following is the output of the show ip arp command for the nexthop address shown in the previous output: Cat-6500-1#show ip arp | include 10.105.30.2 Internet 10.10.10.1 2 000f.20da.833d ARPA Vlan4 If the MAC address is incorrect, then you will need to verify whether another device owns that IP address. You can determine the MAC address of the next-hop device using the show interfaces command if it is a Cisco IOS device. In Cisco Catalyst 6500 Series switches, you can use the Layer 2 traceroute utility, which is invoked using the traceroute mac <source mac> <destination mac> privileged EXEC command. 170 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS If the ARP entry is incomplete, it means that you did not get any replies from that host. In that case, you need to verify that the host is up and running. Following this, continue troubleshooting at the data plane to ensure that the same information is also present. Validate that the route entry in the FIB contains the same next-hop address as in the first step by using the show ip cef command. If there is a discrepancy between the two, a routing loop can be created. The output of this command for the same prefix displays the following entries: Cat-6500-1#show ip cef 0.0.0.0 0.0.0.0 detail 0.0.0.0/0, version 38, epoch 0, cached adjacency 10.10.10.1 0 packets, 0 bytes Flow: AS 0, mask 0 via 10.10.10.1, 0 dependencies, recursive next hop 10.10.10.1, Vlan4 via 10.10.10.1/32 (Default) valid cached adjacency Finally, verify that the CEF adjacency table contains the same rewrite information as the ARP table from Step 2 by using the show adjacency detail command as follows: Cat-6500-1#show adjacency detail | begin 10.10.10.1 IP Vlan4 10.10.10.1(7) 7834810780 packets, 1413540065564 bytes 000F20DA833D 000E39E29C000800 ARP 00:54:07 Epoch: 0 After performing these checks and correcting any identified issues, if you are still experiencing routing issues, you will need to verify the population of the FIB and adjacency table in Ternary Content Addressable Memory (TCAM) using the show mls cef commands. TCAM troubleshooting should be performed under the supervision of a Technical Assistance Center (TAC) engineer. TCAM is similar to CAM but allows information to be looked up much faster. TCAM is described in the SWITCH guide, which is available online at www.howtonetwork.net. UNDERSTANDING AND TROUBLESHOOTING HSRP As is the case with all protocols and technologies, in order to troubleshoot something, you must have a solid understanding of how it works. In this section, we will revisit Hot Standby Router Protocol (HSRP) fundamentals, reinforcing the material covered in the SWITCH exam, and then conclude the section by discussing some common HSRP problems and ways to troubleshoot them. 171 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Hot Standby Router Protocol Overview Hot Standby Router Protocol is a Cisco-proprietary First Hop Redundancy Protocol that allows physical gateways that are configured as part of the same HSRP group to share the same virtual gateway address. Network hosts residing on the same subnet as the gateways are configured with the virtual gateway IP address as their default gateway. Multiple HSRP groups can be configured under the same interface, allowing for load-sharing of traffic between the gateways. While operational, the primary gateway forwards packets destined to the virtual gateway IP address of the HSRP group. In the event that the primary gateway fails, the secondary gateway assumes the role of primary and forwards all packets sent to the virtual gateway IP address. However, it is important to remember that this happens only if the secondary gateway is configured to preempt. If preemption is not configured, then the secondary gateway will never assume the role of the active gateway. Cisco IOS software supports two versions of HSRP: version 1 and version 2. By default, when Hot Standby Router Protocol is enabled in Cisco IOS software, version 1 is enabled. The following section lists and describes the operational differences between these two versions: • HSRPv1 restricts the number of configurable HSRP groups to 255, whereas version 2 numbers have been extended from 0 to 4095. • HSRPv1 routers communicate by sending messages to Multicast group address 224.0.0.2 using UDP port 1985. HSRPv2 routers communicate by sending messages to Multicast group address 224.0.0.102 using UDP port 1985. • The version 2 packet format uses a Type/Length/Value (TLV) format. HSRP version 2 packets received by an HSRPv1 router will have the Type field mapped to the Version field by HSRPv1 and will subsequently be ignored. • Although HSRPv1 advertises timer values, these values are always to the whole second, as it is not capable of advertising or learning millisecond timer values. HSRPv2 is capable of both advertising and learning millisecond timer values. • Version 2 provides improved management and troubleshooting by including a 6-byte Identifier field, which is populated with the physical router interface MAC address and is used to identify uniquely the source of HSRP active Hello messages. In version 1, these messages contain the virtual MAC address as the source MAC, which means it is not possible to determine which HSRP router actually sent the HSRP Hello message. • In HSRPv1, the Layer 2 address that is used by the virtual IP address will be a virtual MAC address composed of 0000.0C07.ACxx, where xx is the HSRP group number in Hexadecimal value and is based on the respective interface. HSRPv2, however, uses a new MAC address range of 0000.0C9F.F000 to 0000.0C9F.FFFF for the virtual gateway IP address. 172 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS Troubleshoo ng Hot Standby Router Protocol The majority of HSRP issues are due to router and switch misconfigurations. For this reason, it is important to have an intimate understanding of the operation of this protocol to be able to identify any HSRP issues quickly. In addition to misconfigurations, device resource utilization and Physical layer and Data Link layer issues can also affect the operation of HSRP. While it is not possible to delve into specifics on all possible HSRP issues, the common HSRP problem scenarios include the following: • Gateway logging continuous HSRP state changes • HSRP gateways not reflecting the correct state • HSRP does not detect peer router • HSRP causes a MAC violation on a secure switch port The following section describes the common HSRP problem scenarios and provides recommended solutions for them. One of the most common problems experienced in networks running HSRP is continuous state changes. If you recall, from the SWITCH guide, we learned that when HSRP is enabled on an interface, the gateway interface goes through a series of states as follows: 1. Disabled 2. Init 3. Listen 4. Speak 5. Standby 6. Active The most common messages seen after HSRP has been configured are transitions between the active and the standby states. During the speak phase, the standby gateway exchanges messages with the active gateway. Upon completion of this phase, the primary gateway transitions to the active state and the backup gateway transitions to the standby state. The standby state indicates that the gateway is ready to assume the role of active gateway if the primary gateway fails, and the active state indicates that the gateway is ready to forward packets actively. Continuous state transitions between active and standby result in the following log messages: %HSRP-5-STATECHANGE: %HSRP-5-STATECHANGE: %HSRP-5-STATECHANGE: %HSRP-5-STATECHANGE: Vlan1 Vlan1 Vlan1 Vlan1 Grp Grp Grp Grp 1 1 1 1 state state state state Speak -> Standby Standby -> Active Active -> Speak Speak -> Standby 173 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby Such error messages indicate a situation in which a standby HSRP router did not receive three successive HSRP Hello packets from its HSRP peer and therefore assumes the role of active router. There are several possible causes for the loss of HSRP packets between the peers. The most common reasons for these errors include the following: • Physical layer problems • Data Link layer problems • Excessive traffic • Gateway resource issues Physical layer issues (e.g., flapping interfaces, errors, etc.) may prevent HSRP packets from being sent successfully between the peers. You can look for link transitions by filtering log output to include only link up/down messages and then troubleshooting the identified links using the show interfaces command, as was described earlier in this guide. For example, you can check the logs on the local gateway and if there is nothing there to indicate link failure, then you can proceed and check the logs of the peer. Keep in mind that the devices might be connected to different switches, so it is important to verify all links in the path between the gateways. Layer 2 problems, such as Spanning Tree issues, can cause a loss of HSRP messages between the peers. STP loops can cause Broadcast storms, duplicated frames, and MAC table inconsistency. All of these problems affect the entire network, especially HSRP. HSRP error messages can be the first indication of an STP issue. As stated in the previous chapter, when troubleshooting STP, it is important to understand the network topology, which includes the location of the root bridge, the backup root bridge (if applicable), blocking or redundant ports, and forwarding ports. Finally, HSRP state changes are often due to high CPU utilization on the gateway. You can use the show processes cpu command to troubleshoot and identify CPU utilization issues. Filter the output of this command to show processes that are utilizing the CPU. If the issue is a result of high CPU utilization, troubleshoot the problem using the appropriate tools, which may include putting a sniffer on the network and then tracing the system that is causing this utilization. A commonly encountered issue following HSRP implementation is that the standby gateway never becomes the active gateway following the primary gateway failure, or that the primary gateway never re-assumes its role after it is back online. These issues are almost always attributed to device misconfigurations. As stated earlier in the previous section, by default, HSRP does not preempt. 174 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS This means that even if the standby HSRP priority is higher than that of the primary, if preemption is not configured, it will never become active. Similarly, if the active gateway fails and is restored (if preemption is not configured), it will not reassume its previous role, even though it has a higher priority than the standby gateway. Consider the following output, which illustrates the output of the show standby command: Cat-6500-1#show standby vlan 1 Vlan1 - Group 1 Local state is Active, priority 105, may preempt Hellotime 3 sec, holdtime 10 sec Next hello sent in 1.541 Virtual IP address is 10.0.0.254 configured Active router is local Standby router is 10.0.0.2 expires in 8.772 Virtual mac address is 0000.0c07.ac01 1 state changes, last state change 00:58:15 IP redundancy name is “hsrp-Vl1-1” (default) From the output above, we can determine that Cat-6500-1 is the active gateway for HSRP Group 1. This switch has been configured with a priority of 105 for the group, and it is also configured to preempt. In the event that the switch or SVI fails and is restored, it will become the active gateway again because it is configured to preempt. Continuing with the example, the following displays the output of the show standby command on the current standby gateway: Cat-6500-2#show standby vlan 1 Vlan1 - Group 1 Local state is Standby, priority 100 Hellotime 3 sec, holdtime 10 sec Next hello sent in 0.585 Virtual IP address is 10.0.0.254 configured Active router is 10.0.0.1, priority 105 expires in 9.948 Standby router is local 1 state changes, last state change 00:58:28 IP redundancy name is “hsrp-Vl1-1” (default) From an outright failure standpoint (e.g., if the entire switch fails or the switch SVI goes down), the configuration will work because if the active gateway is no longer present, then the standby gateway simply becomes the active gateway. No preemption is required in such situations. The issue arises when the active gateway does not fail but instead decrements its priority based on additional configuration, such as interface tracking. Assume, for example, that the configuration on the gateway is modified to include tracking as follows: Cat-6500-1#show standby vlan 1 Vlan1 - Group 1 175 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Local state is Active, priority 105, may preempt Hellotime 3 sec, holdtime 10 sec Next hello sent in 1.541 Virtual IP address is 10.0.0.254 configured Active router is local Standby router is 10.0.0.2 expires in 8.772 Virtual mac address is 0000.0c07.ac01 1 state changes, last state change 00:05:15 IP redundancy name is “hsrp-Vl1-1” (default) Priority tracking 1 interface or object, 1 up: Interface or object Decrement State GigabitEthernet0/1 10 Up After this configuration, the GigabitEthernet0/1 interface on the active gateway is disabled. The show standby command on this device now displays the following: Cat-6500-1#show standby vlan 1 Vlan1 - Group 1 Local state is Active, priority 95 (confgd 105), may preempt Hellotime 3 sec, holdtime 10 sec Next hello sent in 1.541 Virtual IP address is 10.0.0.254 configured Active router is local Standby router is 10.0.0.2 expires in 7.704 sec Virtual mac address is 0000.0c07.ac01 1 state changes, last state change 00:08:35 IP redundancy name is “hsrp-Vl1-1” (default) Priority tracking 1 interface or object, 0 up: Interface or object Decrement State GigabitEthernet0/1 10 Down (administratively down) After this, the show standby command on the standby router now displays the following: Cat-6500-2#show standby vlan 1 Vlan1 - Group 1 Local state is Standby, priority 100 Hellotime 3 sec, holdtime 10 sec Next hello sent in 0.585 Virtual IP address is 10.0.0.254 configured Active router is 10.0.0.1, priority 95 expires in 8.476 sec Standby router is local 1 state changes, last state change 00:01:28 IP redundancy name is “hsrp-Vl1-1” (default) Notice that even with the high priority of 100 versus that on the active router, which is 95, after the tracking configuration decremented the configured priority by 10 (default), the second gateway does not become the active gateway for the group because it has not been configured to preempt. This is a common mistake that is made when configuring HSRP. 176 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS NOTE: When configuring HSRP, ensure that preemption is configured on both the primary and the standby gateways. You can verify whether preemption is configured by looking for coup messages in the output of the debug standby command as illustrated in the following output: Cat-3550-2#debug standby HSRP debugging is on Nov 4 02:29:14.234: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP 10.0.0.254 Nov 4 02:29:17.234: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP 10.0.0.254 Nov 4 02:29:20.234: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP 10.0.0.254 Nov 4 02:29:20.238: HSRP: Vl1 API active virtual MAC 0000.0c07.ac01 found Nov 4 02:29:20.238: HSRP: Vl1 API active virtual MAC 0000.0c07.ac01 found Nov 4 02:29:20.238: HSRP: Vl1 REDIRECT adv in, Passive, active 0, passive 1, from 10.0.0.1 Nov 4 02:29:20.238: HSRP: Vl1 REDIRECT adv in, Active, active 1, passive 2, from 10.0.0.1 Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Coup in 10.0.0.1 Listen pri 105 vIP 10.0.0.254 Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Active: j/Coup rcvd from higher pri router (105/10.0.0.1) Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Active router is 10.0.0.1, was local Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Active -> Speak Nov 4 02:29:20.238: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak Nov 4 02:29:20.238: HSRP: Vl1 Redirect adv out, Passive, active 0 passive 1 Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Active -> Speak Nov 4 02:29:20.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak pri 100 vIP 10.0.0.254 Nov 4 02:29:20.242: HSRP: Vl1 REDIRECT adv in, Active, active 1, passive 1, from 10.0.0.1 Nov 4 02:29:20.242: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP 10.0.0.254 Nov 4 02:29:23.234: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP 10.0.0.254 Nov 4 02:29:23.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak pri 100 vIP 10.0.0.254 Nov 4 02:29:26.238: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP 10.0.0.254 Nov 4 02:29:26.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak pri 100 vIP 10.0.0.254 Nov 4 02:29:29.234: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP 10.0.0.254 Nov 4 02:29:29.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak pri 100 vIP 10.0.0.254 Nov 4 02:29:30.238: HSRP: Vl1 Grp 1 Speak: d/Standby timer expired (unknown) Nov 4 02:29:30.238: HSRP: Vl1 Grp 1 Standby router is local Nov 4 02:29:30.238: HSRP: Vl1 Grp 1 Speak -> Standby 177 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Nov 4 02:29:30.238: Nov 4 02:29:30.238: Standby Nov 4 02:29:30.238: 10.0.0.254 Nov 4 02:29:32.234: 10.0.0.254 %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Speak -> HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Standby pri 100 vIP HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP ... [Truncated Output] If preemption is not configured, then the output of the debug standby would display the following: Cat-3550-2#debug standby HSRP debugging is on Cat-3550-2# Nov 4 22:40:32.905: HSRP: 10.0.0.254 Nov 4 22:40:35.905: HSRP: 10.0.0.254 Nov 4 22:40:35.905: HSRP: 10.0.0.254 Nov 4 22:40:36.901: HSRP: 10.0.0.254 Nov 4 22:40:36.901: HSRP: Nov 4 22:40:38.905: HSRP: 10.0.0.254 Nov 4 22:40:39.901: HSRP: 10.0.0.254 Vl1 Grp 1 Hello in 10.0.0.1 Speak Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP Vl1 Grp 1 Hello in 10.0.0.1 Speak pri 95 vIP Vl1 Grp 1 Hello in 10.0.0.1 Standby pri 105 vIP Vl1 Grp 1 Standby router is 10.0.0.1 Vl1 Grp 1 Hello out 10.0.0.2 Active Vl1 Grp 1 Hello in pri 95 vIP pri 100 vIP 10.0.0.1 Standby pri 105 vIP ... [Truncated Output] In the debug output above, the local HSRP gateway first receives a Hello from the remote HSRP gateway that indicates a priority value of 95. The local HSRP advertises its priority value of 100. Next, the priority on the remote gateway is changed to 105 as illustrated in the received Hello. However, because preemption is not enabled, there is no state change because the remote HSRP gateway does not send out a coup message. Given this, the local gateway remains the active gateway, even though it has a lower HSRP priority value than that of the remote or peer device. There are two primary reasons why an HSRP gateway does not recognize its peer. The first is due to a lack of connectivity between the two devices and the second is due to device misconfigurations. When an HSRP gateway does not recognize its peer, the output of the show standby command displays a message similar to the following: CORE1#show standby vlan 1 Vlan1 - Group 1 178 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS Local state is Active, priority 105, may preempt Hellotime 3 sec, holdtime 10 sec Next hello sent in 0.057 Virtual IP address is 10.0.0.254 configured Active router is local Standby router is unknown Virtual mac address is 0000.0c07.ac01 2 state changes, last state change 00:00:45 IP redundancy name is “hsrp-Vl1-1” (default) Vlan1 - Group 2 Local state is Active, priority 100, may preempt Hellotime 3 sec, holdtime 10 sec Next hello sent in 2.199 Virtual IP address is 10.1.1.254 configured Active router is local Standby router is unknown Virtual mac address is 0000.0c07.ac02 5 state changes, last state change 00:00:18 IP redundancy name is “hsrp-Vl1-2” (default) The router output in this section indicates that the gateway is configured for HSRP but does not recognize its HSRP peers. In order for this to occur, the router must fail to receive HSRP Hellos from the neighbor router. This can be viewed using the debug standby command as follows: Cat-3550-1#debug standby HSRP debugging is on Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Listen: c/Active timer expired (unknown) Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Listen -> Speak Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Backup -> Speak Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak pri 105 vIP 10.0.0.254 Nov 4 02:26:55.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak pri 105 vIP 10.0.0.254 Nov 4 02:26:58.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak pri 105 vIP 10.0.0.254 Nov 4 02:27:01.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak pri 105 vIP 10.0.0.254 Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Speak: d/Standby timer expired (unknown) Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Standby router is local Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Speak -> Standby Nov 4 02:27:02.094: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Speak -> Standby Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Standby pri 105 vIP 10.0.0.254 Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Standby: c/Active timer expired (unknown) Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Active router is local Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Standby router is unknown, was local Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Standby -> Active 179 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Nov 4 02:27:02.594: Nov 4 02:27:02.594: Nov 4 02:27:02.594: Active Nov 4 02:27:02.594: 10.0.0.254 Nov 4 02:27:05.594: 10.0.0.254 Nov 4 02:27:05.594: Active %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active HSRP: Vl1 Redirect adv out, Active, active 1 passive 0 HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Standby -> HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Active pri 105 vIP HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Active pri 105 vIP HSRP: Vl1 Grp 1 Redundancy group hsrp-Vl1-1 state Active -> ... [Truncated Output] Following the output of the debugs, the local gateway sends out a Hello sourced from its local IP address of 10.0.0.1, advertising a priority value of 105 and reflecting the VIP address of 10.0.0.254. The local gateway does not hear anything from its peer and transitions to the standby state. Again, the local gateway hears nothing from its peer and transitions to the active state. Also notice that there are no received HSRP packets from any other devices. When troubleshooting this issue, first verify Physical layer connectivity between the gateways. Use the show interfaces command on the local router to verify Physical layer status. Also keep in mind that Data Link layer issues (e.g., Spanning Tree and VLAN issues) can cause connectivity issues between the gateways. If there is connectivity between the two gateways, then check for HSRP misconfigurations on the devices. In networks using port security, it is not uncommon for HSRP to cause MAC violations due to misconfigurations on the switches. By default, when port security is enabled on a switch port, only a single secure MAC address is permitted. When port security is configured on the switch ports that are connected to the HSRP-enabled routers, it causes a MAC violation, since you cannot have the same secure MAC address on more than one interface. A security violation will occur on a secure port in one of the following situations: • If the maximum number of secure MAC addresses is added to the address table, and a station whose MAC address is not in the table attempts to access the interface • If an address that is learned or configured on one secure interface is seen on another secure interface in the same VLAN By default, a port security violation causes the switch interface to become error-disabled and to shut down immediately, which blocks the HSRP status messages between the routers. Two solutions can be applied to resolve this issue. The first is to configure HSRP using the standby use-bia 180 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS interface configuration command, which forces the gateways to use the interface MAC instead of the virtual MAC for HSRP. The second alternative is simply to disable port security on the ports connected to devices running HSRP. UNDERSTANDING AND TROUBLESHOOTING VRRP As is the case with HSRP, it is important to have a fundamental understanding of Virtual Router Redundancy Protocol (VRRP) to be able to troubleshoot and identify protocol failures. Because VRRP is very similar to HSRP, troubleshooting VRRP follows the same basic concepts described in the HSRP troubleshooting portion of the previous section. Understanding Virtual Router Redundancy Protocol Virtual Router Redundancy Protocol operates in a similar manner to HSRP; however, unlike HSRP, VRRP is an open standard that is defined in RFC 2338. VRRP sends advertisements to the Multicast destination address 224.0.0.18 (VRRP), using IP protocol number 112. At the Data Link layer, advertisements are sent from the virtual router master (VRRP) virtual router MAC address 01-00-5e-00-01xx, where xx represents the two-digit VRRP group number in Hexadecimal value. This allows you to configure up to 255 virtual routers on an interface. However, the actual number of virtual routers that a gateway interface can actually support depends on the following factors: • Gateway processing capability • Gateway memory capability • Gateway interface support of multiple MAC addresses VRRP and HSRP are similar in many ways. For example, both protocols elect the primary router, which is the active router in HSRP and virtual router master in VRRP, based on the priority values configured (both use a default of 100). In the event that priority values are the same, the gateway with the highest IP address is elected. Another similarity is that both HSRP and VRRP can be configured between more than two LAN gateways. This allows multiple gateways to provide redundancy for LAN hosts. However, it should be noted that although both do support this capability, the show vrrp command indicates only which router is the virtual router master; however, the show standby command will show the active and the standby gateways (if issued on a gateway that is neither) as illustrated below: Cat-6500-1#show standby vlan 1 Vlan1 - Group 1 Local state is Listen, priority 100 Hello time 3 sec, hold time 10 sec 181 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Next hello sent in 0.585 Virtual IP address is 10.0.0.254 configured Active router is 10.0.0.3, priority 100 (expires in 7.488 sec) Standby router is 10.0.0.2, priority 100 (expires in 8.992 sec) Virtual MAC address is 0000.0c07.ac01 IP redundancy name is “hsrp-Vl1-1” (default) NOTE: A non-active or non-standby HSRP gateway will remain in the Listen state as illustrated in the output above. All non-virtual router master VRRP gateways will be in a backup state, regardless of the number of gateways in the group. Another similarity between HSRP and VRRP is that both protocols support MD5 and plain text authentication for securing protocol message exchanges. Despite their similarities, there are some significant differences between HSRP and VRRP. Understanding these differences is not only important from a protocol configuration and implementation perspective but also important from a troubleshooting perspective. The following section describes some of the differences between HSRP and VRRP with which you should be intimately familiar. These include version, priority values, VIP configuration, and Hello packet sending, preemption, and timers. By default, VRRP version 2 is enabled when VRRP is configured on a gateway in IOS software. Version 2 is the default and current VRRP version. It is not possible to change the version as is the case with HSRP. There is no VRRP version 1 standard. On the other hand, when HSRP is enabled on a gateway, by default, HSRP version 1 is enabled. However, Cisco IOS software allows administrators to change this to HSRP version 2 using the standby version <1-2> interface configuration command. Like HSRP, VRRP uses priority values to determine which gateway will be elected the virtual router master. The default VRRP priority value is 100; however, this value can be adjusted manually to a value between 1 and 254. While the default HSRP priority value is also 100, the configurable priority values for HSRP are between 1 and 255. NOTE: The priority value of 255 is reserved for a special purpose in VRRP. This is described in the following section on VRRP virtual IP configuration. When configuring HSRP, the virtual IP (VIP) address assigned to the HSRP group cannot be the real IP address assigned to an interface. For example, if an interface has the IP address 10.0.0.1/24, then the VIP address cannot be that IP address or the IP address of any other device that will be part of the same HSRP group. If you attempt to configure the real IP address as the VIP address, the software will print the following error message on the console: 182 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS Cat-6500-1(config)#interface vlan 1 Cat-6500-1(config-if)#standby 1 ip 10.0.0.1 % address cannot equal interface IP address With VRRP, however, the VIP can be configured simply as a logical IP address that is shared between the routers in the group or as the real IP address of one of the devices in the group. When the VIP is configured as the real IP address of a gateway, that device is referred to as the IP Address Owner or sometimes as the Real IP Address Owner. When the IP Address Owner is up, this gateway will respond to all packets that are sent to the IP address. This gateway is also the virtual router master for the group and is assigned a priority value of 255 in Cisco IOS software. As previously stated, only values 1 to 254 are configurable in Cisco IOS software; therefore, this value is allocated by the software only when the virtual IP address for the group is equal to that of the real router interface. Consider the topology that is illustrated in Figure 4-2 below, for example: R1 R3 Fa0/0: 10.0.0.1/24 VRRP Group 1: VIP 10.0.0.1 Fa0/0: 10.0.0.3/24 Fa0/0: 10.0.0.2/24 R2 Fig. 4-2. Specifying Real Interface Addresses as the VRRP VIP NOTE: While Figure 4-2 depicts routers, the same concept is applicable on Multilayer switches. Prior to the configuration of VRRP on the gateways, the interface configurations for all three are provided in the following section. The interface configuration for R1 is as follows: R1#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 222 bytes ! interface FastEthernet0/0 ip address 10.0.0.1 255.255.255.0 duplex auto speed auto end 183 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The interface configuration for R2 is as follows: R2#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 222 bytes ! interface FastEthernet0/0 ip address 10.0.0.2 255.255.255.0 duplex auto speed auto end Finally, the interface configuration for R3 is as follows: R3#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 222 bytes ! interface FastEthernet0/0 ip address 10.0.0.3 255.255.255.0 duplex auto speed auto end Following this initial verification, VRRP Group 1 is configured under the FastEthernet interface of all gateways using the vrrp 1 ip 10.0.0.1 interface configuration command, which specifies the real IP address of R1 as the VIP. No additional VRRP configuration is performed. The running configuration on all gateways is updated as follows, beginning with R1: R1#show running-config interface fastEthernet0/0 Building configuration... Current configuration : 222 bytes ! interface FastEthernet0/0 ip address 10.0.0.1 255.255.255.0 duplex auto speed auto vrrp 1 ip 10.0.0.1 end The interface configuration for R2 is as follows: R2#show running-config interface FastEthernet0/0 Building configuration... 184 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS Current configuration : 222 bytes ! interface FastEthernet0/0 ip address 10.0.0.2 255.255.255.0 duplex auto speed auto vrrp 1 ip 10.0.0.1 end Finally, the interface configuration for R3 is as follows: R3#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 222 bytes ! interface FastEthernet0/0 ip address 10.0.0.3 255.255.255.0 duplex auto speed auto vrrp 1 ip 10.0.0.1 end Following this configuration, because the real IP address of R1 is used as the VRRP group VIP, this gateway is designated IP Address Owner and an automatic priority value is assigned to this gateway, allowing it to respond to all packets sent to that IP address. This is confirmed using the show vrrp [group] command. Following is the output of this command on R1: R1#show vrrp FastEthernet0/0 - Group 1 State is Master Virtual IP address is 10.0.0.1 Virtual MAC address is 0000.5e00.0101 Advertisement interval is 1.000 sec Preemption enabled Priority is 255 Master Router is 10.0.0.1 (local), priority is 255 Master Advertisement interval is 1.000 sec Master Down interval is 3.003 sec Gateways R2 and R3 default to the backup state and show R1 as the virtual router master. Because both R2 and R3 show the same information, only the output of the show vrrp command on R2 is provided below because R3 will reflect the same information: R2#show vrrp FastEthernet0/0 - Group 1 State is Backup 185 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Virtual IP address is 10.0.0.1 Virtual MAC address is 0000.5e00.0101 Advertisement interval is 1.000 sec Preemption enabled Priority is 100 Master Router is 10.0.0.1, priority is 255 Master Advertisement interval is 1.000 sec Master Down interval is 3.609 sec (expires in 3.581 sec) NOTE: When using the interface address of a gateway as the VIP, you may see the following error message if the gateway goes down and then comes back up: *Mar 11 17:40:54.159: %IP-4-DUPADDR: Duplicate address 10.0.0.1 on FastEthernet0/0, sourced by 000d.289e.f940 This error message should be clear immediately; however, if you see it repeatedly citing multiple MAC addresses, there may be another device on the network using the same address. Another notable difference between VRRP and HSRP is the sending of Hello packets. When you enable HSRP, the standby and active gateways exchange Hello packets, which are sent out every three seconds by default. If you enabled the debug standby command on either gateway, then you would see an output similar to the following: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: HSRP: Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Grp Grp Grp Grp Grp Grp Grp Grp Grp Grp Grp Grp Grp 1 1 1 1 1 1 1 1 1 1 1 1 1 Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello in out in out in out in out in out in out in 10.0.0.2 10.0.0.3 10.0.0.2 10.0.0.3 10.0.0.2 10.0.0.3 10.0.0.2 10.0.0.3 10.0.0.2 10.0.0.3 10.0.0.2 10.0.0.3 10.0.0.2 Standby Active Standby Active Standby Active Standby Active Standby Active Standby Active Standby pri pri pri pri pri pri pri pri pri pri pri pri pri 100 100 100 100 100 100 100 100 100 100 100 100 100 vIP vIP vIP vIP vIP vIP vIP vIP vIP vIP vIP vIP vIP 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 10.0.0.254 With VRRP, the virtual router master sends advertisements to other VRRP routers in the same group every one second by default. However, the virtual router backup(s) will not send any packets to the VRRP group. This is often a point of confusion when troubleshooting. If the debug vrrp command was issued on the virtual router master, then you would see advertisements being sent out similar to the following: *Mar 10 21:23:03.255: VRRP: Grp 1 sending Advertisement checksum D5FA *Mar 10 21:23:04.183: VRRP: Grp 1 sending Advertisement checksum D5FA *Mar 10 21:23:05.067: VRRP: Grp 1 sending Advertisement checksum D5FA 186 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS *Mar *Mar *Mar *Mar *Mar *Mar 10 10 10 10 10 10 21:23:05.939: 21:23:06.775: 21:23:07.755: 21:23:08.723: 21:23:09.703: 21:23:10.639: VRRP: VRRP: VRRP: VRRP: VRRP: VRRP: Grp Grp Grp Grp Grp Grp 1 1 1 1 1 1 sending sending sending sending sending sending Advertisement Advertisement Advertisement Advertisement Advertisement Advertisement checksum checksum checksum checksum checksum checksum D5FA D5FA D5FA D5FA D5FA D5FA On the virtual router backup, the debug vrrp command would display the following output: *Mar *Mar *Mar *Mar *Mar *Mar *Mar *Mar *Mar *Mar 11 11 11 11 11 11 11 11 11 11 14:14:29.643: 14:14:29.643: 14:14:30.611: 14:14:30.611: 14:14:31.591: 14:14:31.591: 14:14:32.527: 14:14:32.527: 14:14:33.347: 14:14:33.347: VRRP: VRRP: VRRP: VRRP: VRRP: VRRP: VRRP: VRRP: VRRP: VRRP: Grp Grp Grp Grp Grp Grp Grp Grp Grp Grp 1 1 1 1 1 1 1 1 1 1 Advertisement priority 255, ipaddr 10.0.0.1 Event - Advert higher or equal priority Advertisement priority 255, ipaddr 10.0.0.1 Event - Advert higher or equal priority Advertisement priority 255, ipaddr 10.0.0.1 Event - Advert higher or equal priority Advertisement priority 255, ipaddr 10.0.0.1 Event - Advert higher or equal priority Advertisement priority 255, ipaddr 10.0.0.1 Event - Advert higher or equal priority In the debug output on the virtual router backup, the event being observed is that the gateway received an advertisement from another gateway for VRRP Group 1 that has a higher or equal priority to itself. This message does not indicate a message sent by the local gateway itself. The advertisement messages include the priority and address of the current virtual router master. By default, unlike HSRP, preemption is enabled for VRRP and no explicit configuration is required by the administrator to enable this functionality. Finally, there is a difference in the range of MAC addresses used by both VRRP and HSRP. When using VRRP, at the Data Link layer, advertisements are sent from the virtual router master virtual router MAC address 01-00-5e-00-01xx, where xx represents the two-digit Hexadecimal group number. With HSRPv1, the Layer 2 address that is used by the virtual IP address will be the virtual MAC address 0000.0C07.ACxx, where xx is the HSRP group number in Hexadecimal value and is based on the respective interface. HSRPv2 uses MAC addresses in the range of 0000.0C9F.F000 to 0000.0C9F.FFFF for the virtual gateway IP address. Troubleshoo ng Virtual Router Redundancy Protocol For the most part, VRRP troubleshooting follows the same basic logic as HSRP troubleshooting. For example, assume that one or more gateways were logging the following messages: *Mar 11 17:39:38.699: %VRRP-6-STATECHANGE: Fa0/0 Grp 1 state Backup -> Master *Mar 11 17:39:42.059: %VRRP-6-STATECHANGE: Fa0/0 Grp 1 state Master -> Backup *Mar 11 17:40:12.531: %VRRP-6-STATECHANGE: Fa0/0 Grp 1 state Backup -> Master 187 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I *Mar *Mar *Mar *Mar *Mar 11 11 11 11 11 17:40:19.167: 17:40:27.835: 17:40:36.851: 17:40:44.775: 17:40:54.171: %VRRP-6-STATECHANGE: %VRRP-6-STATECHANGE: %VRRP-6-STATECHANGE: %VRRP-6-STATECHANGE: %VRRP-6-STATECHANGE: Fa0/0 Fa0/0 Fa0/0 Fa0/0 Fa0/0 Grp Grp Grp Grp Grp 1 1 1 1 1 state state state state state Master Backup Master Backup Master -> -> -> -> -> Backup Master Backup Master Backup Given the state transitions being logged, you would perform the same troubleshooting for VRRP as you would if this were HSRP. In other words, you would check Physical layer connectivity, verify whether there are any Data Link layer problems, check for excessive network traffic or congestion, and verify gateway resource utilization using the show processes command. UNDERSTANDING AND TROUBLESHOOTING GLBP As is the case with the two previously described First Hop Redundancy Protocols (FHRPs), it is important to have a fundamental understanding of Gateway Load Balancing Protocol (GLBP) in order to troubleshoot any protocol failures or abnormal behavior effectively. Following the same logic used in the previous two sections, this section will first delve into GLBP operation and then conclude with a brief section on how to troubleshoot GLBP. Understanding Gateway Load Balancing Protocol Gateway Load Balancing Protocol is a Cisco-proprietary FHRP, like HSRP. However, unlike HSRP and VRRP, which allow for multiple active gateways via the use or configuration of multiple groups, GLBP allows multiple gateways to forward packets actively using a single GLBP group. GLBP gateways communicate through Hello messages that are sent every three seconds to the Multicast address 224.0.0.102, using UDP port 3222. When Global Load Balancing Protocol is enabled, the GLBP group members elect one gateway to be the active virtual gateway (AVG) for that group. The AVG is the gateway that has the highest priority value. In the event that the priority values are equal, the AVG will be elected as the gateway with the highest IP address in the group. The other gateways in the GLBP group provide backup for the AVG in the event that the AVG becomes unavailable. The AVG answers all ARP requests for the virtual router address. In addition, the AVG assigns a virtual MAC address to each member of the GLBP group. Each gateway is therefore responsible for forwarding packets that are sent to the virtual MAC address it has been assigned by the AVG. These gateways are referred to as active virtual forwarders (AVFs) for their assigned MAC addresses. GLBP operation is illustrated in Figure 4-3 below: 188 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS Fig. 4-3. Gateway Load Balancing Protocol Operation Referencing Figure 4-3, gateway GLBP-1 is configured with a priority of 110, gateway GLBP-2 is configured with a priority of 105, and gateway GLBP-3 is using the default priority of 100. GLBP1 is elected AVG, and GLBP-2 and GLBP-3 are assigned virtual MAC addresses bbbb.bbbb.bbbb and cccc.cccc.cccc, respectively, and become AVFs for those virtual MAC addresses. GLBP-1 is also the AVF for its own virtual MAC address, which is aaaa.aaaa.aaaa. Hosts 1, 2, and 3 are all configured with the default gateway address 192.168.1.254, which is the virtual IP address assigned to the GLBP group. Host 1 sends out an ARP Broadcast for its gateway IP address. This is received by the AVG (GLBP-1), which responds with its own virtual MAC address aaaa.aaaa.aaaa. Host 1 forwards traffic to 192.168.1.254 to this MAC address. Host 2 sends out an ARP Broadcast for its gateway IP address. This is received by the AVG (GLBP1), which responds with the virtual MAC address of bbbb.bbbb.bbbb (GLBP-2). Host 2 forwards traffic to 192.168.1.254 to this MAC address and GLBP-2 forwards this traffic. Host 3 sends out an ARP Broadcast for its gateway IP address. This is received by the AVG (GLBP1), which responds with the virtual MAC address of cccc.cccc.cccc (GLBP-3). Host 3 forwards traffic to 192.168.1.254 to this MAC address and GLBP-3 forwards this traffic. 189 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I A GLBP group allows up to four virtual MAC addresses per group. The AVG is responsible for assigning the virtual MAC addresses to each member of the group. Other group members request a virtual MAC address after they discover the AVG through Hello messages. Gateways are assigned the next virtual MAC address in sequence. A gateway that is assigned a virtual MAC address by the AVG is known as a primary virtual forwarder while a gateway that has learned the virtual MAC address is referred to as a secondary virtual forwarder. The allocated virtual MAC addresses for the AVFs can be seen using the show glbp command illustrated in the following output: R3#show glbp FastEthernet0/0 FastEthernet0/0 - Group 1 State is Active 5 state changes, last state change 00:01:21 Virtual IP address is 10.0.0.254 Hello time 3 sec, hold time 10 sec Next hello sent in 0.000 secs Redirect time 600 sec, forwarder time-out 14400 sec Preemption disabled Active is local Standby is 10.0.0.2, priority 105 (expires in 8.996 sec) Priority 110 (configured) Weighting 100 (default 100), thresholds: lower 1, upper 100 Load balancing: round-robin Group members: 000d.289e.f940 (10.0.0.1) 000f.235e.f120 (10.0.0.3) local 0013.7faf.3e00 (10.0.0.2) There are 3 forwarders (1 active) Forwarder 1 State is Listen 4 state changes, last state change 00:01:00 MAC address is 0007.b400.0101 (learnt) Owner ID is 000d.289e.f940 Redirection enabled, 599.188 sec remaining (maximum 600 sec) Time to live: 14399.188 sec (maximum 14400 sec) Preemption enabled, min delay 30 sec Active is 10.0.0.1 (primary), weighting 100 (expires in 9.188 sec) Forwarder 2 State is Listen 2 state changes, last state change 00:01:01 MAC address is 0007.b400.0102 (learnt) Owner ID is 0013.7faf.3e00 Redirection enabled, 597.144 sec remaining (maximum 600 sec) Time to live: 14397.144 sec (maximum 14400 sec) Preemption enabled, min delay 30 sec Active is 10.0.0.2 (primary), weighting 100 (expires in 7.144 sec) Forwarder 3 State is Active 3 state changes, last state change 00:01:32 MAC address is 0007.b400.0103 (default) 190 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS Owner ID is 000f.235e.f120 Redirection enabled Preemption enabled, min delay 30 sec Active is local, weighting 100 Within the GLBP group, a single gateway is elected as the AVG, and another gateway is elected as the standby virtual gateway. All other remaining gateways in the group are placed in a Listen state. If an AVG fails, then the standby virtual gateway will assume responsibility for the virtual IP address. At the same time, an election is held and a new standby virtual gateway is then elected from the gateways currently in the Listen state. In the event the AVF fails, one of the secondary virtual forwarders in the Listen state assumes responsibility for the virtual MAC address. However, because the new AVF is already a forwarder using another virtual MAC address, GLBP needs to ensure that the old forwarder MAC address ceases being used and hosts are migrated away from this address. This is achieved using the following two timers: 1. The redirect timer 2. The timeout timer The redirect time is the interval during which the AVG continues to redirect hosts to the old virtual forwarder MAC address. When this timer expires, the AVG stops using the old virtual forwarder MAC address in ARP replies, although the virtual forwarder will continue to forward packets that were sent to the old virtual forwarder MAC address. When the timeout timer expires, the virtual forwarder is removed from all gateways in the GLBP group. Any clients still using the old MAC address in their ARP caches must refresh the entry to obtain the new virtual MAC address. GLBP uses Hello messages to communicate the current state of these two timers. By default, the GLBP gateway preemptive scheme is disabled. A backup virtual gateway can become the AVG only if the current AVG fails, regardless of the priorities assigned to the virtual gateways. This default behavior is the same as that which is employed by HSRP. GLBP uses a weighting scheme to determine the forwarding capacity of each gateway that is in the GLBP group. The weighting assigned to a gateway in the GLBP group can be used to determine whether it will forward packets and, if so, the proportion of hosts residing on the LAN for which it will forward packets. By default, each gateway is assigned a weight of 100. Administrators can additionally configure the gateways to make dynamic weighting adjustments by configuring object tracking, such as for inter- 191 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I faces and IP prefixes, in conjunction with GLBP. If an interface fails, then the weighting is dynamically decreased by the specified value, allowing gateways with higher weighting values to be used to forward more traffic than those with lower weighting values. Troubleshoo ng Gateway Load Balancing Protocol GLBP troubleshooting follows the same logic as the other FHRPs described in previous sections. In order to troubleshoot effectively, you need to understand intimately how the protocol is configured and operates. Real-time troubleshooting for GLBP can be performed by issuing the debug glbp command. However, as is the case with any debug command, keep in mind that this action increases processor utilization on the gateway, which may degrade performance. TROUBLESHOOTING SWITCH SUPERVISOR REDUNDANCY No explicit means or methods to troubleshoot Switch Supervisor Redundancy actually exist. Instead, the troubleshooting process entails understanding the redundancy method implemented, and how it is supposed to function (i.e., what the norm is). From there, you can identify any anomalies with the switchover process. For the most part, if a supervisor redundancy mode is not functioning the way it should be, you should seek the assistance of the TAC. The following section describes the supported supervisor redundancy methods. Understanding Switch Supervisor Redundancy Cisco Catalyst 4500 and 6500 Series switches support two supervisor modules or engines within the switch chassis to allow for high availability. When the switch boots up, the first supervisor that boots up is referred to as the primary or active supervisor engine and the second supervisor module is referred to as the standby or redundant supervisor engine. When the primary or active supervisor is up, it controls all switch functions, including management operations, data plane operations, and control plane operations. In the event that the active supervisor engine fails or is removed, the standby supervisor assumes this responsibility. The standby supervisor engine assumes primary supervisor engine status when one of the following events occurs: • The primary supervisor engine fails or crashes • The primary supervisor engine is rebooted • The administrator forces a manual failover from active to standby • The primary supervisor engine is physically removed • Clock synchronization between the supervisor engines fails (SSO) Cisco IOS software supports the following three modes for redundant supervisor engine implementations: 192 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS 1. Route Processor Redundancy 2. Route Processor Redundancy Plus 3. Stateful Switchover With Route Processor Redundancy (RPR), when the switch boots up, the RPR process runs between the two supervisor engines, and the first supervisor to complete the boot process becomes the active supervisor engine and the second supervisor becomes the standby supervisor engine. When using RPR, the standby supervisor engine is only partially booted and initialized, meaning that not all switch subsystems become operational. For example, on the standby, the Multilayer Switch Feature Card (MSFC) and the Policy Feature Card (PFC) are not active. Clock synchronization occurs between primary and backup every 60 seconds, and the startup configuration and configuration registers are synchronized between supervisors. When the active supervisor engine fails, the standby supervisor engine becomes operational and the following occurs within the switch: • • • All switching modules are reloaded and powered up again Remaining subsystems on the MSFC are brought up ACLs are reprogrammed into supervisor engine hardware Because the standby supervisor engine is not fully initialized, any failover from the active to the standby supervisor engine results in a disruption of network traffic, as the standby supervisor engine goes through the steps listed above and assumes the active supervisor role. This entire process generally takes two to four minutes to accomplish. Route Processor Redundancy Plus (RPR+) improves on RPR and provides failover generally within 30 to 60 seconds. When RPR+ mode is used, the redundant supervisor engine is fully initialized and configured but is not fully operational. When the redundant supervisor engine first initializes, the startup configuration file is copied from the active supervisor engine to the redundant supervisor engine, which overrides any existing startup configuration file on the redundant supervisor engine, allowing the supervisor engines to become synchronized. When switch configuration changes occur during normal operation, redundancy performs an incremental synchronization from the active supervisor engine to the redundant supervisor engine. RPR+ synchronizes user-entered CLI commands incrementally line-by-line from the active supervisor engine to the redundant supervisor engine. Even though the redundant supervisor engine is fully initialized, it interacts only with the active supervisor engine to receive incremental changes to the configuration files. The console on the 193 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I redundant supervisor engine is locked and you cannot enter CLI commands on the redundant supervisor engine. When the active supervisor engine fails, the redundant supervisor engine finishes initializing without reloading other switch modules, and the following events occur on the switch: • Traffic is disrupted until the redundant supervisor engine completes the takes over • The switch maintains any static routes across the switchover • The switch does not maintain any dynamic routing protocol information • The switch clears the FIB tables on switchover • The switch clears the CAM tables on switchover • State information, such as active TCP sessions, is not maintained on switchover When implementing RPR+, it is important to ensure that the supervisor modules are similar (i.e., the same model, memory, and version, for example) and that they are running the same Cisco IOS software version. If any of these are different, the switch will revert to RPR mode instead of RPR+. Stateful Switchover (SSO) is the preferred redundancy mode for supervisor engines. Similar to RPR and RPR+, SSO establishes one of the supervisor engines as active while the other supervisor engine is designated standby. Unlike RPR and RPR+, however, with SSO, the redundant supervisor engine is fully booted and initialized and then SSO synchronizes the two supervisors. With SSO, supervisor engines must be synchronized so that the redundant supervisor engine is always ready to assume control in the event that the active supervisor engine fails. Configuration information and data structures are synchronized between the supervisor engines at startup and whenever changes to the active supervisor engine configuration occur. Unlike RPR and RPR+ redundancy, SSO maintains state information between the redundant supervisor engines. This includes forwarding information in the FIB, as well as adjacency entries, which ensures that Layer 2 traffic is not interrupted and the switch can still forward Layer 3 traffic after a switchover from the active to the redundant supervisor engine. When using SSO, the following events cause a switchover: • A hardware failure on the active supervisor engine • Clock synchronization failure between supervisor engines • A manual switchover During SSO switchover, all system control and routing protocol execution is transferred from the active supervisor to the standby supervisor engine within zero to three seconds. Non-Stop Forwarding (NSF) works in conjunction with SSO to minimize the amount of time a network is un- 194 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS available to its users following a switchover, while continuing to forward IP packets. NSF is used primarily to ensure the continued forwarding of IP packets following a supervisor engine switchover and is supported by BGP, OSPF, EIGRP, IS-IS, and CEF. Non-Stop Forwarding allows routing protocols to detect a switchover and take the necessary action to continue forwarding network traffic. NSF allows routing protocols to recover route information from the NSF-capable peer devices instead of waiting for the FIB to be rebuilt before the switch can actually begin forwarding traffic. This allows for high availability and resiliency during supervisor engine switchover. When NSF is implemented, routing protocols depend on CEF to continue forwarding packets during switchover while they build the Routing Information Base (RIB) tables. After the routing protocols have converged, CEF updates the FIB table and removes stale route entries. CEF also updates the switch modules with the new FIB information. Cisco NSF is configured on a per-routing protocol basis. TROUBLESHOOTING SWITCH PERFORMANCE ISSUES In the final section of this chapter, we will discuss some reasons for switch performance degradation. Additionally, we will also discuss solutions that can be implemented to safeguard or mitigate against these contributing factors. One of the most telling signs of performance issues is high CPU utilization. However, a common misconception is that high CPU utilization indicates the depletion of resources on a device and the threat of a crash. While this may be true for software-based routers, a capacity issue is almost never a symptom of high CPU utilization with hardware-based forwarding switches, such as the Catalyst 4500 and 6500 Series switches. Cisco software-based routers use software in order to process and route packets. Therefore, the CPU utilization on a router tends to increase as it performs more packet processing and routing. On such platforms, the show processes cpu command can provide a fairly accurate indication of the traffic processing load on the router. However, this is not always true for hardware-based forwarding switches. Before we delve into detail on some of the reasons that contribute to or can cause Catalyst switch performance issues, the following section briefly describes the architecture of the Catalyst 6500 Supervisor Engine 720. Catalyst 6500 Series Switch Supervisor Module Components The Supervisor engine is the ‘brains’ of the Catalyst 6500 Series switches. Although going into detail on all components of the Supervisor 720 module is beyond the scope of the TSHOOT certification exam, a basic understanding of the Supervisor module is required in order to understand the terminology used in MLS. The Supervisor 720 module is comprised of the following three integrated core components: 195 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 1. The Multilayer Switch Feature Card 3 2. The Policy Feature Card 3 3. The switch or switching fabric The Multilayer Switch Feature Card 3 (MSFC 3) is a standard daughter card on the Supervisor 720 engine. The MSFC 3 runs all software processes and supports both the Switch Processor (SP) and Route Processor (RP). The RP supports Layer 3 features and functions, such as routing protocols, address resolution, ICMP, the management of virtual interfaces, and IOS configuration, among many other things. The SP supports Layer 2 features and functionality, such as the Spanning Tree Protocol, VLAN Trunking Protocol, and Cisco Discovery Protocol. The MSFC 3 builds the CEF FIB table in software and then downloads this table to the hardware ASICs on the Policy Feature Card 3 (PFC 3) and distributed forwarding engine (if present) that make the forwarding decisions for IP Unicast and Multicast traffic. The Policy Feature Card 3 (PFC3) is equipped with a high-performance ASIC complex that supports a wide range of hardware-based features. The PFC makes forwarding decisions in hardware and supports routing and bridging, Quality of Service (QoS), and IP Multicast packet replication, and processes security policies such as Access Control Lists (ACLs). The PFC 3 requires the RP to populate the route cache or optimized route table structure used by the Layer 3 switching ASIC. If no route processor is present, the PFC can perform only Layer 3 and Layer 4 QoS classification and ACL filtering but will not be able to perform Layer 3 switching. Finally, the switch fabric is the connection between multiple ports or slots within a switch. It is used for data transport. As was previously stated, in hardware-based platforms such as the Catalyst 6500 Series switches, the switch makes forwarding decisions in hardware. Therefore, when the switches make the forwarding or switching decision for most frames that pass through the switch, the process does not involve the supervisor engine CPU. However, there are reasons why some traffic must be processed in software instead of hardware. This concept is referred to as punting because the packet is punted from hardware to software for processing. These reasons include, but are not limited to, the following events or functions: • Packets destined to the switch, such as a Telnet session that is destined for the switch • Packets requiring special processing, such as packets with IP options or expired TTL • ACL-based features, such as ACL logging • Hardware resources full conditions, such as when the CAM or TCAM are full 196 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS • Multicast traffic, such as IGMP packets • Other features, such as NBAR and DHCP snooping • IP version 6 packet processing When packets are processed in software instead of hardware, the CPU utilization of the RP is increased due to the additional load. This increased CPU utilization can result in significant performance and forwarding reduction for the switch. In addition to these features, high CPU utilization can also be caused by other events, including Broadcast storms, Spanning Tree loops, debugging, SNMP polling, and SPAN sessions. It is important to understand that high CPU utilization on the switch does not necessary reflect the hardware forwarding performance of the switch. For example, a small consistent stream of packets with IP options may result in high CPU utilization, even when no other traffic is present. In order to troubleshoot high CPU utilization issues, you need some sort of reference point. For this reason, it is considered good practice to baseline and monitor the supervisor engine’s CPU utilization and make a note of all of the processes that generate the highest CPU utilization in a stable network environment with a ‘normal’ traffic load. This provides a solid reference point that you can then use to determine ‘normal’ from ‘abnormal’ utilization. In Cisco IOS software, you can use the show processes cpu [history] command to view processor utilization statistics over a five second, one minute, and five minute period of time. Appending the history keyword provides processor utilization data over a period of one minute, one hour, and 72 hours. Using this and the baseline information, you can determine whether the CPU is consistently high or whether there are spikes of high utilization. If the CPU is consistently high, identify the process(es) causing this and troubleshoot them as needed. In the event of spikes, you may need to perform additional activities, such as a SPAN of the CPU, to determine what is causing the high utilization and spikes. If the CPU utilization is high due to the punt of traffic to the RP, determine what that traffic is and why the traffic is punted. On distributed forwarding Catalyst switches, you can use the show interfaces command to determine whether packets ingressing a certain interface are being switched in hardware or software by viewing the Layer 2 and Layer 3 counters as illustrated below: Cat-6500-1#show interfaces GigabitEthernet1/1 GigabitEthernet1/1 is up, line protocol is up (connected) Hardware is C6k 1000Mb 802.3, address is 000a.42d1.7580 (bia 000a.42d1.7580) Internet address is 100.100.100.2/24 MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 1/255, rxload 1/255 197 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Encapsulation ARPA, loopback not set Keepalive set (10 sec) Half-duplex, 100Mb/s input flow-control is off, output flow-control is off Clock mode is auto ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:00:00, output 00:00:00, output hang never Last clearing of “show interface” counters never Input queue: 5/75/1/24075 (size/max/drops/flushes); Total output drops: 2 Queueing strategy: fifo Output queue: 0/40 (size/max) 30 second input rate 7609000 bits/sec, 14859 packets/sec 30 second output rate 0 bits/sec, 0 packets/sec L2 Switched: ucast: 0 pkt, 184954624 bytes - mcast: 1 pkt, 500 bytes L3 in Switched: ucast: 2889916 pkt, 0 bytes - mcast: 0 pkt, 0 bytes mcast L3 out Switched: ucast: 0 pkt, 0 bytes mcast: 0 pkt, 0 bytes 2982871 packets input, 190904816 bytes, 0 no buffer Received 9 broadcasts, 0 runts, 0 giants, 0 throttles 1 input errors, 1 CRC, 0 frame, 28 overrun, 0 ignored 0 input packets with dribble condition detected 1256 packets output, 124317 bytes, 0 underruns 2 output errors, 1 collisions, 2 interface resets 0 babbles, 0 late collision, 0 deferred 0 lost carrier, 0 no carrier 0 output buffer failures, 0 output buffers swapped out In the output of the show interfaces command, the L2 traffic is the amount of traffic that is switched in hardware, while the L3 traffic shows the amount of received traffic that is punted to the CPU. From the output above, there is a great deal of traffic being punted to the CPU, which can result in high CPU utilization. If a great deal of traffic is being process-switched and is punted to the CPU, the show processes cpu command will reflect high utilization statistics for the IP Input process. You can then use additional Cisco IOS software tools and utilities, such as SPAN, to determine where this traffic is coming from (i.e., the source), as well as what kind of traffic it is. Protec ng the Route Processor As stated earlier in this chapter, when using CEF, the majority of packets are forwarded by the PFC, referencing the entries contained in the FIB, which is populated by the MSFC. However, there are certain exception packets, such as packets with IP options, which must be punted to the Route Processor (MSFC) for further processing. While the PFC can forward up to 30 million packets per second (pps), the MSFC is typically capable of forwarding only up to 500,000 packets per second (pps). This significant difference in the forwarding capabilities means that it is possible for the MSFC to be oversubscribed or overutilized if the PFC punts a large number of packets to it. This may result in the following: 198 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS • Routing protocols getting out of sync with the rest of the network. This may result in network flaps and major network-wide transitions. • The console on the switch may lock up. This results in the switch becoming unreachable and unmanageable, leaving administrators no avenues to troubleshoot. • Other RP-based processes may cease operation altogether. This may result in the switch running with unpredictable results, or crashing. To prevent such situations, IOS software allows administrators to configure MLS rate limiting using the mls rate-limit global configuration command. Rate limiters throttle the packets per second (pps) rate of certain packets that are punted to the MSFC by the PFC, which effectively ensures that the MSFC is never overwhelmed by the much faster PFC, thereby allowing the switch to continue normal operations. Because the rate limiting functionality is performed in hardware, MLS rate limiters are typically referred to as hardware rate limiters (HWRLs) in various texts. MLS rate-limiter configuration is beyond the scope of the TSHOOT certification exam. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Catalyst Switch VLAN Interfaces Overview • In a switched network, VLANs separate devices into different collision domains • VLANs are also used to separate devices into different subnets • Multilayer switches support the configuration of Switched Virtual Interfaces (SVIs) • SVIs represent VLANs and allow the switch to serve as the default gateway for the VLAN • An SVI is not automatically created when a VLAN is created • By default, however, an SVI for VLAN 1 is automatically created by the software • A Switched Virtual Interface is a very resilient interface • In order for an SVI to be placed into the up/up state, the following conditions must be met: 1. The VLAN exists and is active in the VLAN database of the switch 2. The VLAN interface is not administratively down 3. At least one Layer 2 (access port or trunk) port exists, has a link up on this VLAN 4. At least one Layer 2 (access port or trunk) is in the STP forwarding state Catalyst Switch MLS Overview • Multilayer Switching (MLS) combines Layer 2, Layer 3, and Layer 4 switching technologies • MLS allows switches to forward packets at wire speed using hardware • Cisco supports MLS for both Unicast and Multicast traffic flows • In MLS switching, an MLS cache, is maintained for the Layer 3-switched flows 199 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • The MLS cache maintains flow information for all active flows • The MLS cache includes entries for traffic statistics • After the MLS cache is created, packets for an existing flow can be Layer 3-switched • MLS integrates both data plane and control plane functions • The control plane is where routing and other control information, is stored and exchanged • The control plane is responsible for updating the routing table • The data plane is responsible for the actual forwarding of data • The data plane is typically populated using information derived from the control plane • MLS is enabled by configuring Cisco Express Forwarding (CEF) on the switch • CEF uses a FIB to make IP destination prefix-based switching decisions • The FIB is conceptually similar to a routing table or information base (RIB) • The adjacency table is created to contain all connected next hops • An adjacent node is a node that is one hop away, i.e. directly connected Troubleshoo ng Mul layer Switching • MLS troubleshooting requires troubleshooting at both the control and data planes • The following steps should be taken when troubleshooting Unicast MLS issues: 1. Verify that IP routing information for the address is correct 2. Verify that the next hop has a valid MAC address 3. Verify that the FIB next hop is the same as the RIB next hop 4. Verify the CEF adjacency table rewrite information 5. Verify FIB and adjacency table population in TCAM Understanding and Troubleshoo ng HSRP • Hot Standby Router Protocol is a Cisco-proprietary First Hop Redundancy Protocol • Cisco IOS software supports two versions of HSRP: version 1 and version 2 • By default, when HSRP is enabled in Cisco IOS software, HSRP version 1 is enabled • HSRPv1 supports up to 255 groups; HSRPv2 supports up to 4096 groups • HSRPv1 uses Multicast group address 224.0.0.2 and UDP port 1985 • HSRPv2 uses Multicast group address 224.0.0.102 and UDP port 1985 • The version 2 packet format uses a Type/Length/Value (TLV) format • HSRPv1 does not support millisecond timer values; HSRPv2 supports millisecond timers • HSRPv2 includes a 6-byte Identifier field • HSRPv1 uses virtual MAC addresses in the range 0000.0c07.acxx • HSRPv2 uses virtual MAC addresses in the range 0000.0C9F.F000 to 0000.0C9F.FFFF • The majority of HSRP issues are due to router and switch misconfigurations • Common HSRP problem scenarios include the following: 1. Gateway Logging Continuous HSRP State Changes 200 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS 2. HSRP Gateways Not Reflecting the Correct State 3. HSRP Does Not Detect Peer Router 4. HSRP Causes MAC Violation on a Secure Switch Port Understanding and Troubleshoo ng VRRP • VRRP operates in a similar manner to HSRP; however VRRP is an open standard • VRRP is defined in RFC 2338 • VRRP sends advertisements to Multicast address 224.0.0.18, using IP protocol number 112 • At the Data Link layer, VRRP uses MAC addresses in the range 01-00-5e-00-01xx • Both HSRP and VRRP use a default priority value of 100 • By default, VRRP version 2 is enabled when VRRP is configured on a gateway • Configurable VRRP priorities range from 1-254; for HSRP the range is 1-255 • VRRP priority 255 is automatically configured when an interface IP is used for a group • When the IP Address Owner is up, it responds to all packets that are sent to the IP address Understanding and Troubleshoo ng GLBP • Gateway Load Balancing Protocol is a Cisco-proprietary FHRP, like HSRP • GLBP allows multiple gateways to actively forward packets using a single GLBP group • GLBP gateways communicate through Hello messages that are sent every 3 seconds • GLBP sends updates to the Multicast address 224.0.0.102, using UDP port 3222 • GLBP group members elect one gateway to be the AVG for that group • The AVG is the gateway that has the highest priority value • The other gateways in the GLBP group provide backup for the AVG • The AVG answers all ARP requests for the virtual router address • The AVG assigns a virtual MAC address to each member of the GLBP group • Each gateway is responsible for forwarding packets that are sent to its virtual MAC address • These gateways are referred to as active virtual forwarders (AVFs) • A GLBP group allows up to four virtual MAC addresses per group • By default, the GLBP gateway preemptive scheme is disabled • A backup virtual gateway can become the AVG only if the current AVG fails • By default, each gateway is assigned a weight of 100 Troubleshoo ng Switch Supervisor Redundancy • Catalyst 4500 and 6500 series switches support two supervisor engines for high availability • The standby supervisor engine assumes primary supervisor if the following happens: 1. The primary supervisor engine fails or crashes 2. The primary supervisor engine is rebooted 3. The administrator forces a manual failover from active to standby 201 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 4. The primary supervisor engine is physically removed 5. Clock synchronization between the supervisor engines fails • Cisco IOS software supports the following three modes for redundant supervisor implementations: 1. Route Processor Redundancy (RPR) 2. Route Processor Redundancy Plus (RPR+) 3. Stateful Switchover (SSO) • When using RPR, the standby supervisor engine is only partially booted and initialized • RPR switchover generally takes between 2 and 4 minutes • RPR+ improves on RPR and provides failover generally within 30 to 60 seconds • With RPR+, the standby is initialized but not fully operational • RPR+ synchronizes user-entered CLI commands incrementally line-by-line • After RPR+ switchover, the following events occur: 1. Traffic is disrupted until the Redundant Supervisor Engine completes the takes over 2. The switch maintains any static routes across the switchover 3. The switch does not maintain any dynamic routing protocol information 4. The switch clears the FIB Tables on switchover 5. The switch clears the CAM Tables on switchover 6. State information, such as active TCP sessions, is not maintained on switchover • SSO is the preferred redundancy mode for supervisor engines • With SSO, the redundant supervisor is fully booted and initialized • With SSO, supervisor engines must be synchronized • With SSO, configuration information and data structures are synchronized • SSO maintains state information between the redundant supervisor engines • When using SSO, the following events cause a switchover: 1. A hardware failure on the active supervisor engine 2. Clock synchronization failure between supervisor engines 3. A manual switchover Troubleshoo ng Switch Performance Issues • One of the most telling signs of performance issues on devices is high CPU utilization • Cisco software-based routers use software in order to process and route packets • High CPU utilization typically indicates capacity issues on software-based routers • Catalyst 4500 and 6500 series switches are hardware-based platforms • High CPU utilization does not indicate capacity issues on hardware-based platforms • The Supervisor 720 module is comprised of the following three integrated core components: 202 CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS 1. The Multilayer Switch Feature Card 3 2. The Policy Feature Card 3 3. The Switch or Switching Fabric • Even in hardware-based platforms, packets must be punted and processed in software • The reasons packets may be punted include the following: 1. Packets destined to the switch, such as a Telnet session that is destined for the switch 2. Packets requiring special processing, such as packets with IP options or expired TTL 3. ACL-based features, such as ACL logging 4. Hardware resources full conditions, such as when the CAM or TCAM are full 5. Multicast traffic, such as IGMP packets 6. Other features, such as NBAR and DHCP Snooping 7. IP version 6 packet processing 203 CHAPTER 5 Troubleshoo ng EIGRP C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I E nhanced Interior Gateway Routing Protocol is a Cisco-proprietary advanced Distance Vector routing protocol. As a CCNP network engineer, it is important to understand how to support EIGRP, as it is a very commonly implemented routing protocol. As with the previous technologies described thus far in this guide, in order to troubleshoot and support networks running EIGRP, you must have a solid understanding of the inner workings of the protocol itself. The TSHOOT certification exam objective that is covered in this chapter is as follows: • Troubleshoot EIGRP While it is not possible to delve into all potential EIGRP problem scenarios, this chapter will discuss some of the most common problem scenarios when EIGRP is implemented as the Interior Gateway Protocol (IGP) of choice. This chapter is divided into the following sections: • Enhanced Interior Gateway Protocol Overview • Troubleshooting Neighbor Relationships • Troubleshooting Route Installation • Troubleshooting Route Advertisement • Troubleshooting Stub Routing Issues • Troubleshooting SIA Issues • Troubleshooting Route Redistribution Issues • Debugging EIGRP Routing Issues ENHANCED INTERIOR GATEWAY PROTOCOL OVERVIEW EIGRP is an advanced Distance Vector routing protocol that incorporates traditional Distance Vector features, such as split horizon, and traditional Link State features, such as incremental updates. EIGRP runs directly over IP using protocol number 88 and is a Cisco-proprietary routing protocol. The following sections describe some core characteristics and components that are integral to the operation of EIGRP. It is important to have a solid understanding of EIGRP in order to troubleshoot any routing problems effectively. Keep in mind that the information provided is simply a summary and recap of the material available in the ROUTE study guide. Please refer to the ROUTE guide for additional information on specific topics and technologies if needed. Packets EIGRP uses several different types of packets to exchange routing and control information between EIGRP neighbors. EIGRP uses the following types of packets: 206 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P • Hello packets • Acknowledgement packets • Update packets • Query packets • Reply packets EIGRP sends Hello packets once it has been enabled on a router for a particular network. These messages are used to identify neighbors and, once identified, serve or function as a keepalive mechanism between neighbors. EIGRP Hello packets are sent to the link local Multicast group address 224.0.0.10. Hello packets sent by EIGRP do not require an Acknowledgment packet to be sent confirming that they were received. Because they require no explicit acknowledgment, Hello packets are classified as unreliable EIGRP packets. EIGRP Hello packets have an OPCode of 5. An EIGRP Acknowledgment (ACK) packet is simply an EIGRP Hello packet that contains no data. Acknowledgement packets are used by EIGRP to confirm reliable delivery of EIGRP packets. ACKs are always sent to a Unicast address, which is the source address of the sender of the reliable packet, and not to the EIGRP Multicast group address. In addition, Acknowledgement packets will always contain a non-zero acknowledgment number. The ACK uses the same OPCode as the Hello packet (OPCode 5) because it is essentially a Hello packet that contains no information. EIGRP Update packets are used to convey reachability of destinations. In other words, Update packets contain EIGRP routing updates. When a new neighbor is discovered, Update packets are sent via Unicast so the neighbor can build up its EIGRP Topology Table. In other cases, such as a link cost change, updates are sent via Multicast. It is important to know that Update packets are always transmitted reliably and always require explicit acknowledgement. Update packets are assigned an OPCode of 1. EIGRP Query packets are Multicast and are used to request routing information reliably. EIGRP Query packets are sent to neighbors when a route is not available and the router needs to ask about the status of the route for fast convergence. If the router that sends out a Query does not receive a response from any of its neighbors, it resends the Query as a Unicast packet to the non-responsive neighbor(s). If no response is received after sixteen attempts, then the neighbor relationship is reset. EIGRP Query packets are assigned an OPCode of 3. EIGRP Reply packets are sent in response to Query packets. The Reply packets are used to respond reliably to a Query packet. Reply packets are Unicast to the originator of the Query. The EIGRP Reply packets are assigned an OPCode of 4. 207 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I NOTE: EIGRP can also use another packet type called the Request packet. This is used in route server applications. EIGRP Request packets can be sent via either Multicast or Unicast, but they are always transmitted unreliably. In other words, they do not require an explicit acknowledgment. Route server applications are beyond the scope of the TSHOOT certification exam. Neighbor Discovery and Maintenance EIGRP supports both dynamic and static (manually configured) neighbor discovery. Dynamic neighbor discovery is performed by sending Hello packets to the destination Multicast group address 224.0.0.10. Unlike the dynamic EIGRP neighbor discovery process, static EIGRP neighbor relationships require manual neighbor configuration on the router. When static neighbors are configured, the local router uses the Unicast neighbor address to send packets to these routers to establish a neighbor relationship. Hello and Hold Timers EIGRP uses different Hello and Hold timers for different types of media. Hello timers are used to determine the interval rate EIGRP Hello packets are sent. The Hold timer is used to determine the time that will elapse before a router considers an EIGRP neighbor as down. By default, the Hold time is three times the Hello interval. EIGRP sends Hello packets every five seconds on Broadcast, point-to-point serial, point-to-point subinterfaces, and multipoint circuits greater than T1. The default Hold time is 15 seconds. EIGRP sends Hello packets every 60 seconds on other link types. These include low-bandwidth WAN links less than T1 speed. The default Hold time for neighbor relationships across these links is also three times the Hello interval and therefore defaults to 180 seconds. The Neighbor Table The EIGRP Neighbor Table is used by routers running EIGRP to maintain state information about EIGRP neighbors. When newly discovered neighbors are learned, the address and interface of the neighbor is recorded. This is applicable to both dynamically discovered neighbors and statically defined neighbors. Reliable Transport Protocol EIGRP needs its own transport protocol, Reliable Transport Protocol, to ensure the reliable delivery of Update, Query, and Reply packets. The use of sequence numbers also ensures that the EIGRP packets are received in the correct order. 208 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P Metric Calcula on EIGRP uses a composite metric, which includes different variables that are referred to as the K values. The K values are constants that are used to distribute weight to different path aspects, which may be included in the composite EIGRP metric. The default values for the K values are K1 = K3 = 1 and K2 = K4 = K5 = 0. In other words, K1 and K3 are set to a default value of 1, while K2, K4, and K5 are set to a default value of 0. Assuming the default K value settings, the complete EIGRP metric can be calculated using the following mathematical formula: [K1 * bandwidth + (K2 * bandwidth) / (256 - load) + K3 * delay] * [K5 / (reliability + K4)] However, given that only K1 and K3 have any positive values by default, the default EIGRP metric calculation is performed using the following mathematical formula: [(10⁷ ⁄ least bandwidth on path) + (sum of all delays)] × 256 This essentially means that, by default, EIGRP uses the minimum bandwidth on the path to a destination network and the total delay to compute routing metrics. However, Cisco IOS software allows administrators to set other K values to non-zero values to incorporate other variables into the composite metric. Diffusing Update Algorithm The Diffusing Update Algorithm (DUAL) is at the crux of the EIGRP routing protocol. DUAL looks at all routes received from neighbor routers, compares them, and then selects the lowest (best) metric, loop-free path to the destination network. The best route, which is the route with the lowest metric or Feasible Distance (FD), is then referred to as the Successor route. The Feasible Distance includes both the metric of a network as advertised by the connected neighbor and the cost of reaching that particular neighbor. The metric that is advertised by the neighbor router is referred to as the Reported Distance (RD) or as the Advertised Distance (AD). This is that neighbor’s metric to the destination network. Therefore, the Feasible Distance includes the Reported Distance plus the cost of reaching that particular neighbor. The next-hop router for the Successor route is referred to as the Successor. The Successor route is placed into the IP routing table and the EIGRP Topology Table. This route points to the Successor, which is the next-hop router for the Successor route. 209 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Any other routes to the same destination network that have a lower Reported Distance than the Feasible Distance of the Successor path are guaranteed to be loop-free and are referred to as Feasible Successor routes. These routes are not placed into the IP routing table; however, they are still placed into the EIGRP Topology Table, along with the Successor routes. In order for a route to become a Feasible Successor route, it must meet the Feasibility Condition (FC), which occurs only when the Reported Distance to the destination network is less than the Feasible Distance. In the event that the Reported Distance is more than the Feasible Distance, the route is not selected as Feasible Successor. This is used by EIGRP to prevent the possibility of loops. The Topology Table The EIGRP Topology Table is populated by EIGRP PDMs acted upon by the DUAL Finite State Machine. All known destination networks and subnets that are advertised by neighboring EIGRP routers are stored in the EIGRP Topology Table. This includes Successor routes, Feasible Successor routes, and even routes that have not met the Feasibility Condition. The Topology Table allows all EIGRP routers to have a consistent view of the entire network. It also allows for rapid convergence in EIGRP networks. Each individual entry in the Topology Table contains the destination network and the neighbor (or neighbors) that have advertised the destination network. Both the Feasible Distance and the Reported Distance are stored in the Topology Table. The EIGRP Topology Table contains the information needed to build a set of distances and vectors to each reachable network, including the following: • The lowest bandwidth on the path to the destination network • The total or cumulative delay to the destination network • The reliability of the path to the destination network • The loading of the path to the destination network • The minimum Maximum Transmission Unit (MTU) to the destination network • The Feasible Distance to the destination network • The Reported Distance by the neighbor router to the destination network • The route source (only external routes) of the destination network Stub Rou ng Stub routing is an EIGRP feature designed primarily to conserve local router resources, such as memory and CPU, and to improve network stability. The stub routing feature is most commonly used in hub-and-spoke networks. This feature is configured only on the spoke routers. When configured on the spoke router, the router announces its stub router status using a new Type/Length/ 210 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P Value (TLV) in the EIGRP Hello messages. When the hub router receives the Hello packet from the spoke router, one of two things happens: 1. If the hub router is running a newer version of software, upon receiving the Hello packet with the new TLV, the router will not query the stub router about the status of any prefixes. This is the default mode of operation in current Cisco IOS software versions. 2. If the hub router is running a version of software less than 12.0(7)T, upon receiving the Hello packet with the new TLV, the router will ignore this field because it does not understand it. The router will send Query packets to the stub router if it needs information about a route or routes. However, the stub router will respond with a message of inaccessible to any queries received from the hub router. This method allows for backward compatibility with older versions of software while retaining stub routing functionality. When the stub routing feature is enabled on the spoke router, the stub router will advertise only specified routes to the hub router. The router will not advertise routes received from other EIGRP neighbors to the hub router. Cisco IOS software allows administrators to select the type of routes that the stub router should advertise to the hub router; however, by default, the stub router will advertise connected and summary routes only. The EIGRP stub routing feature provides the following four advantages when implemented in hub-and-spoke networks: 1. It prevents sub-optimal routing from occurring within hub-and-spoke EIGRP networks 2. It prevents stub routers with low-speed links from being used as transit routers 3. It eliminates EIGRP Query storms, allowing the EIGRP network to convergence faster 4. It reduces the required amount of configuration commands on the stub routers EIGRP Route Summariza on Route summarization reduces the amount of information that routers must process, which then allows for faster convergence within the network. Summarization also restricts the size of the area that is affected by network changes by hiding detailed topology information from certain areas within the network. Finally, summarization is used to define a Query boundary for EIGRP, which supports the following two types of route summarization: 1. Automatic route summarization 2. Manual route summarization By default, automatic route summarization is in effect when EIGRP is enabled on the router. This is implemented using the auto-summary command. This command allows EIGRP to perform automatic route summarization at Classful boundaries. 211 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Unlike EIGRP automatic summarization, EIGRP manual route summarization is configured and implemented at the interface level using the ip summary-address eigrp [ASN] [network] [mask] [distance] [leak-map <name>] interface configuration command. By default, an EIGRP summary address is assigned a default administrative distance value of 5. This default assignment can be changed by specifying the desired administrative distance value as specified by the [distance]keyword. By default, when manual route summarization is configured, EIGRP will not advertise the more specific route entries that fall within the summarized network entry. The leak-map <name> keyword can be configured to allow EIGRP route leaking, wherein EIGRP allows specified specific route entries to be advertised in conjunction with the summary address. Those entries that are not specified in the leak map are still suppressed. EIGRP Unequal Cost Load Sharing In addition to equal cost load-balancing capabilities, EIGRP is also able to perform unequal cost load sharing. This unique ability allows EIGRP to use unequal cost paths to send outgoing packets to the destination network based on weighted traffic share values. Unequal cost load sharing is enabled using the variance <multiplier> router configuration command. The <multiplier> keyword represents an integer between 1 and 128. A multiplier of 1, which is the default, implies that no unequal cost load sharing is being performed. This default setting is also illustrated in the output of the show ip protocols command. If any other value is used, EIGRP will load-share across the successor route, as well as any other route with a route metric at least x that of the successor metric. By default, routes that do not meet the Feasibility Condition are excluded from this calculation. TROUBLESHOOTING NEIGHBOR RELATIONSHIPS It is important to understand that simply enabling EIGRP between two or more routers does not guarantee that a neighbor relationship will be established. In addition to certain parameters-matching, additional factors can also result in a failure of EIGRP neighbor relationship establishment. The EIGRP neighbor relationship may not establish due to any of the following: • The neighbor routers are not on a common subnet • Mismatched primary and secondary subnets • Mismatched K values • Mismatched AS number • Access Control Lists are filtering EIGRP packets 212 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P • Physical layer issues • Data Link layer issues • Mismatched authentication parameters Uncommon subnet issues are one of the most common problems experienced when attempting to establish EIGRP neighbor relationships. When EIGRP cannot establish a neighbor relationship because of an uncommon subnet, the following error message will be printed on the console or will be logged by the router or switch: *Mar 2 22:12:46.589 CST: IP-EIGRP(Default-IP-Routing-Table:1): Neighbor 150.1.1.2 not on common subnet for FastEthernet0/0 *Mar 2 22:12:50.977 CST: IP-EIGRP(Default-IP-Routing-Table:1): Neighbor 150.1.1.2 not on common subnet for FastEthernet0/0 The most common reason for the neighbor routers being on an uncommon subnet is a misconfiguration issue. It may be that the router interfaces have been accidentally configured on two different subnets. However, if the neighbors are connected via a VLAN, it is possible that Multicast packets could be leaking between VLANs, resulting in this error. The first troubleshooting step, however, simply would be to verify the interface configuration on the devices. Following this, additional troubleshooting steps, such as VLAN troubleshooting (if applicable) could be undertaken to isolate and resolve the issue. Another common reason for this error message is using secondary addresses when attempting to establish EIGRP neighbor relationships. Again, the simplest way to troubleshoot such issues is to verify the router or switch configurations. For example, assume the error message above was being printed on the console of the local router. The first troubleshooting step would be to validate the IP addresses configured on the interface as follows: R1#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 140 bytes ! interface FastEthernet0/0 ip address 150.2.2.1 255.255.255.0 duplex auto speed auto end Next, validate that the configuration is the same on the device with the IP address 150.1.1.2 as follows: R2#show running-config interface FastEthernet0/0 Building configuration... 213 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Current configuration : 140 bytes ! interface FastEthernet0/0 ip address 150.2.2.2 255.255.255.0 secondary ip address 150.1.1.2 255.255.255.0 duplex auto speed auto end From the output above, we can see that the primary subnet on R1 is the secondary subnet on the local router. EIGRP will not establish neighbor relationships using a secondary address. The resolution for this issue simply would be to correct the IP addressing configuration under the FastEthernet0/0 interface of R2 as follows: R2#config terminal Enter configuration commands, one per line. End with CNTL/Z. R2(config)#interface FastEthernet0/0 R2(config-if)#ip address 150.2.2.2 255.255.255.0 R2(config-if)#ip address 150.1.1.2 255.255.255.0 secondary R2(config-if)#end *Oct 20 03:10:27.185 CST: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.2.2.1 (FastEthernet0/0) is up: new adjacency EIGRP K values are constants that are used to distribute weight to different path aspects, which may be included in the composite EIGRP metric. The default values for the K values are K1 = K3 = 1 and K2 = K4 = K5 = 0. If changed on one router or switch, then these values must be adjusted for all other routers or switches within the autonomous system. The default EIGRP K values can be viewed using the show ip protocols command as illustrated below: R1#show ip protocols Routing Protocol is “eigrp 150” Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is 1 Default networks flagged in outgoing updates Default networks accepted from incoming updates EIGRP metric weight K1=1, K2=0, K3=1, K4=0, K5=0 EIGRP maximum hopcount 100 EIGRP maximum metric variance 1 Redistributing: eigrp 150, ospf 1 EIGRP NSF-aware route hold timer is 240s Automatic network summarization is not in effect Maximum path: 4 Routing for Networks: 10.1.0.0/24 172.16.1.0/30 Routing Information Sources: Gateway Distance Last Update 214 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P (this router) 90 15:59:19 172.16.0.2 90 12:51:56 172.16.1.2 90 00:27:17 Distance: internal 90 external 170 When K values are reset on a router, all neighbor relationships for the local router will be reset. If the values are not consistent on all routers following the reset, the following error message will be printed on the console, and the EIGRP neighbor relationship(s) will not be established: *Oct 20 03:19:14.140 CST: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.2.2.1 (FastEthernet0/0) is down: Interface Goodbye received *Oct 20 03:19:18.732 CST: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.2.2.1 (FastEthernet0/0) is down: K-value mismatch NOTE: While EIGRP K values can be adjusted using the metric-weights command, this is not recommended without assistance from seasoned network engineers or the Technical Assistance Center (TAC). Unlike OSPF, which uses a locally significant process ID, EIGRP requires the same autonomous system number (among other variables) when establishing neighbor relationships with other routers. Troubleshoot such issues by comparing configurations of devices and ensuring that the autonomous system number (among other variables) is consistent between routers that should establish neighbor relationships. A good indicator that neighbors are in a different autonomous system would be a lack of bidirectional Hellos, even in the presence of basic IP connectivity between the routers. This can be validated using the show ip eigrp traffic command, the output of which is illustrated in the section that follows. ACLs and other filters are also common causes for routers failing to establish EIGRP neighbor relationships. Check router configurations and those of intermediate devices to ensure that EIGRP or Multicast packets are not filtered. A very useful troubleshooting command to use is the show ip eigrp traffic command. This command provides statistics on all EIGRP packets. Assume, for example, that you have verified basic connectivity and configurations between two devices, but the EIGRP neighbor relationship is still not up. In that case, you could use this command to check to see whether the routers are exchanging Hello packets, before enabling debugging on the local device, as illustrated below: R2#show ip eigrp traffic IP-EIGRP Traffic Statistics for AS 2 Hellos sent/received: 144/0 Updates sent/received: 0/0 215 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Queries sent/received: 0/0 Replies sent/received: 0/0 Acks sent/received: 0/0 SIA-Queries sent/received: 0/0 SIA-Replies sent/received: 0/0 Hello Process ID: 149 PDM Process ID: 120 IP Socket queue: 0/2000/0/0 (current/max/highest/drops) Eigrp input queue: 0/2000/0/0 (current/max/highest/drops) In the output above, notice that the local router has not received any Hello packets, although it has sent out 144 Hellos. Assuming that you have verified IP connectivity between the two devices, as well as the configuration, you could also check ACL configurations on the local routers, as well as intermediate devices (if applicable), to ensure that EIGRP or Multicast traffic is not being filtered. For example, you might find an ACL that is configured to deny Class D and Class E traffic, while allowing all other traffic, such as the following ACL: R2#show ip access-lists Extended IP access list 100 10 deny ip 224.0.0.0 15.255.255.255 any 20 deny ip any 224.0.0.0 15.255.255.255 (47 matches) 30 permit ip any any (27 matches) Physical and Data Link layer issues, and ways in which these can affect routing protocols and other traffic, have been described in detail in previous chapters. You can troubleshoot these issues using the show interfaces, show interfaces counters, show vlan, and show spanning-tree commands, among other commands described in those chapters. To avoid being redundant, we will not restate the Physical and Data Link layer troubleshooting steps. Finally, common authentication configuration mistakes include using different key IDs when configuring key chains, and specifying different or mismatched passwords. When authentication is enabled under an interface, the EIGRP neighbor relationships are reset and reinitialized. If previously established neighbor relationships do not come up following authentication implementation, verify the authentication configuration parameters by looking at the running configuration or using the show key chain and show ip eigrp interfaces detail [name] commands on the router. Following is a sample output of the information that is printed by the show key chain command: R2#show key chain Key-chain EIGRP-1: key 1 -- text “eigrp-1” accept lifetime (always valid) - (always valid) [valid now] send lifetime (always valid) - (always valid) [valid now] 216 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P Key-chain EIGRP-2: key 1 -- text “eigrp-2” accept lifetime (00:00:01 UTC Nov send lifetime (00:00:01 UTC Nov 1 Key-chain EIGRP-3: key 1 -- text “eigrp-3” accept lifetime (00:00:01 UTC Dec send lifetime (00:00:01 UTC Dec 1 1 2010) - (infinite) 2010) - (infinite) 1 2010) - (00:00:01 UTC Dec 31 2010) 2010) - (00:00:01 UTC Dec 31 2010) Following is a sample output of the information that is printed by the show ip eigrp interfaces detail [name] command: R2#show ip eigrp interfaces detail Serial0/0 IP-EIGRP interfaces for process 1 Xmit Queue Mean Pacing Time Interface Peers Un/Reliable SRTT Un/Reliable Se0/0 0 0/0 0 0/1 Hello interval is 5 sec Next xmit serial <none> Un/reliable mcasts: 0/0 Un/reliable ucasts: 0/0 Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 0 Retransmissions sent: 0 Out-of-sequence rcvd: 0 Authentication mode is md5, key-chain is “EIGRP-1” Use unicast Multicast Flow Timer 0 Pending Routes 0 When troubleshooting in general, it is recommended that you use show commands in Cisco IOS software instead of enabling debug commands. While debugging provides real-time information, it is very processor intensive, and it could result in high CPU utilization of the device and, in some cases, even crash the device. In addition to show commands, you should also pay attention to the various error messages that are printed by the software, as these provide useful information that can be used to troubleshoot and isolate the root cause of the problem. TROUBLESHOOTING ROUTE INSTALLATION There are instances where you might notice that EIGRP is not installing certain routes into the routing table. For the most part, this is typically due to some misconfigurations versus a protocol failure. Some common reasons for route installation failure include the following: • The same route is received via another protocol with a lower administrative distance • EIGRP summarization • Duplicate Router IDs are present within the EIGRP domain • The routes do not meet the Feasibility Condition 217 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In the ROUTE guide, we learned that the administrative distance concept is used to determine how reliable the route source is. The lower the administrative distance, the more reliable the route source is. If the same route is received from three different protocols, the route with the lowest administrative distance will be installed into the routing table. When using EIGRP, keep in mind that EIGRP uses different administrative distance values for summary, internal, and external routes. If you are running multiple routing protocols, it is important to ensure that you understand administrative distance values and how they impact routing table population. This is especially of concern when you are redistributing routes between multiple routing protocols. By default, EIGRP automatically summarizes at Classful boundaries and creates a summary route pointing to the Null0 interface. Because the summary is installed with a default administrative distance value of 5, any other similar dynamically received routes will not be installed into the routing table. Consider the topology illustrated in Figure 5-1 below, for example: Fa0/0: 10.2.2.0/24 EIGRP ASN 150 Se0/0: 150.1.1.2/30 Fa0/0: 10.1.1.0/24 Se0/0: 150.1.1.1/30 R2 R1 Fig. 5-1. EIGRP Automatic Summarization Referencing the diagram illustrated in Figure 5-1, the 150.1.1.0./30 subnet separates 10.1.1.0/24 and 10.2.2.0/24. When automatic summarization is enabled, both R1 and R2 will summarize the 10.1.1.0/24 and 10.2.2.0/24 subnets, respectively, to 10.0.0.0/8. This summary route will be installed into the routing table with an administrative distance of 5 and a next-hop interface of Null0. This lower administrative distance value will prevent either router from accepting or installing the 10.0.0.0/8 summary from the other router as illustrated in the following output: R2#debug eigrp fsm EIGRP FSM Events/Actions debugging is on R2# R2# *Mar 13 03:24:31.983: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.1.1.1 (FastEthernet0/0) is up: new adjacency *Mar 13 03:24:33.995: DUAL: dest(10.0.0.0/8) not active *Mar 13 03:24:33.995: DUAL: rcvupdate: 10.0.0.0/8 via 150.1.1.1 metric 156160/128256 *Mar 13 03:24:33.995: DUAL: Find FS for dest 10.0.0.0/8. FD is 128256, RD is 128256 *Mar 13 03:24:33.995: DUAL: 0.0.0.0 metric 128256/0 *Mar 13 03:24:33.995: DUAL: 150.1.1.1 metric 156160/128256 found Dmin is 128256 *Mar 13 03:24:33.999: DUAL: RT installed 10.0.0.0/8 via 0.0.0.0 218 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P In the debug output above, the local router receives the 10.0.0.0/8 route from neighbor 150.1.1.1 with a route metric of 156160/128256. However, DUAL also has the same route locally, due to summarization, and this route has a route metric of 128256/0. The local route is therefore installed into the routing table instead because it has the better metric. The same would also be applicable on R1, which would install its local 10.0.0.0/8 route into the RIB instead. The result is that neither router would be able to ping the 10.x.x.x subnet of the other router. To resolve this issue, automatic summarization should be disabled using the no auto-summary command on both of the routers, allowing the specific route entries to be advertised instead. The primary use of the EIGRP Router ID (RID) is to prevent routing loops. The RID is used to identify the originating router for external routes. If an external route is received with the same RID as the local router, the route will be discarded. However, duplicate RIDs do not affect any internal EIGRP routes. This feature is designed to reduce the possibility of routing loops in networks where route redistribution is being performed on more than one ASBR. The originating router ID can be viewed in the output of the show ip eigrp topology command as illustrated below: R1#show ip eigrp topology 2.2.2.2 255.255.255.255 IP-EIGRP (AS 1): Topology entry for 2.2.2.2/32 State is Passive, Query origin flag is 1, 1 Successor(s), FD is 156160 Routing Descriptor Blocks: 150.1.1.2 (FastEthernet0/0), from 150.1.1.2, Send flag is 0x0 Composite metric is (156160/128256), Route is External Vector metric: Minimum bandwidth is 100000 Kbit Total delay is 5100 microseconds Reliability is 255/255 Load is 1/255 Minimum MTU is 1500 Hop count is 1 External data: Originating router is 2.2.2.2 AS number of route is 0 External protocol is Connected, external metric is 0 Administrator tag is 0 (0x00000000) If you suspect a potential duplicate RID issue, you can check the events in the EIGRP event log to see if any routes have been rejected because of a duplicate RID. The following illustrates a sample output of the EIGRP event log, showing routes that have been rejected because they were received from a router with the same RID as the local router: R2#show ip eigrp events Event information for AS 1: [Truncated Output] 219 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 21 22 23 24 25 26 27 28 03:05:39.747 03:05:39.747 03:05:06.659 03:05:06.659 03:05:06.659 03:04:33.311 03:04:33.311 03:04:33.311 Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored route, route, route, route, route, route, route, route, neighbor info: 10.0.0.1 Serial0/0 dup router: 150.1.1.254 metric: 192.168.2.0 284160 neighbor info: 10.0.0.1 Serial0/0 dup router: 150.1.1.254 metric: 192.168.1.0 284160 neighbor info: 10.0.0.1 Serial0/0 dup router: 150.1.1.254 ... [Truncated Output] The resolution for the solution above would be to change the RID on neighbor router 10.0.0.1 or on the local router, depending on which one of the two has been incorrectly configured. Finally, it is important to remember that EIGRP will not install routes into the routing table if they do not meet the Feasibility Condition. This is true even if the variance command has been configured on the local router. It is a common misconception that issuing the variance command will allow EIGRP to load-share over any paths whose route metric is x times that of the successor metric. Consider the topology illustrated in Figure 5-2 below, for example: R1 10 5 20 R2 R3 30 R4 10 15 R5 192.168.100.0/24 Fig. 5-2. Understanding the Feasibility Condition Figure 5-2 shows a basic network that includes metrics from R1 to the 192.168.100.0/24 subnet. Referencing Figure 5-2, Table 5-1 below displays the Reported Distance and Feasible Distance values as seen on R1 for the 192.168.100.0/24 network. 220 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P Table 5-1. R1 Paths and Distances Network Path R1 – R2 – R5 R1 – R3 – R5 R1 – R4 – R5 R1 Neighbor R2 R3 R4 Neighbor Metric (RD) 30 10 15 R1 Feasible Distance 35 30 25 R1 has been configured to load-share across all paths and the variance 2 command is added to the router configuration. This allows EIGRP to load-share across paths with up to twice the metric of the Successor, which would include all three paths based on the default metric calculation. However, despite this configuration, only two paths will be installed and used. First, R1 will select the path through R4 as the Successor route based on the Feasible Distance for the route, which is 25. This route will be placed into the IP routing table as well as the EIGRP Topology Table. The metric for neighbor R3 to the 192.168.100.0/24 network, also referred to as the Reported Distance or Advertised Distance, is 10. This is less than the Feasible Distance, and so this route meets the Feasibility Condition and is placed into the EIGRP Topology Table. The metric for neighbor R2 to the 192.168.100.0/24 network is 30. This value is higher than the Feasible Distance of 25. This route does not meet the Feasibility Condition and is not considered a Feasible Successor. The route, however, is still placed into the EIGRP Topology Table. However, the path will not be used for load sharing, even though the metric falls within the range specified by the configuration of the variance 2 EIGRP router configuration command. In such situations, consider using EIGRP offset-lists to ensure all routes are considered. TROUBLESHOOTING ROUTE ADVERTISEMENT There are times when it may seem that EIGRP is either not advertising the networks that it has been configured to advertise or is advertising networks that it has not been configured to advertise. For the most part, such issues are typically due to router and switch misconfigurations. There are several reasons why EIGRP might not advertise a network that it has been configured to advertise. Some of these reasons include the following: • • • Distribute lists Split horizon Summarization Incorrectly configured distribute lists are one reason why EIGRP might not advertise a network that it has been configured to advertise. When configuring distribute lists, ensure that all networks that should be advertised are permitted by the referenced IP ACL or IP Prefix List. 221 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Another common issue pertaining to network advertisement when using EIGRP is the default behavior of split horizon. Split horizon is a Distance Vector protocol feature that mandates that routing information cannot be sent back out of the same interface through which it was received. This prevents the re-advertising of information back to the source from which it was learned, effectively preventing routing loops. This concept is illustrated in Figure 5-3 below: 192.168.1.0/26 10.1.1.0/24 received via Se0/0. Do NOT advertise back out of the same interface. HQ Se0/0 102 Frame Relay 301 201 172.16.1.0/29 A dv er tis es 10 .1 .1 .0 /2 4 103 S1 EIGRP AS 150 10.1.1.0/24 S2 10.2.2.0/24 Fig. 5-3. EIGRP Split Horizon The topology in Figure 5-3 illustrates a classic hub and spoke network, with router HQ as the hub router and routers S1 and S2 as the two spoke routers. On the Frame Relay WAN, each spoke router has a single DLCI provisioned between itself and the HQ router in a partial-mesh topology. By default, EIGRP split horizon is enabled for WAN interfaces connected to packet-switched networks, such as Frame Relay. This means that the HQ router will not advertise routing information learned on Serial0/0 out of the same interface. The effect of this default behavior is that the HQ router will not advertise the 10.1.1.0/24 prefix received from S1 to S2 because the route is received via the Serial0/0 interface and the split horizon feature prevents the router from advertising information learned on that interface back out of the same interface. The same is also applicable for the 10.2.2.0/24 prefix the HQ router receives from S2. The recommended solution for this problem would be to disable the spilt horizon feature on the WAN interface using the no ip split-horizon eigrp [asn] interface configuration command on the HQ router. By default, automatic summarization at the Classful boundary is enabled for EIGRP. This can be validated using the show ip protocols command. In addition to automatic summarization, EIGRP 222 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P also supports manual summarization at the interface level. Regardless of the method implemented, summarization prevents the more specific route entries that are encompassed by the summary from being advertised to neighbor routers. If route summarization is configured incorrectly, it may appear that EIGRP is not advertising certain networks. For example, consider the basic network topology that is illustrated in Figure 5-4 below: 10.1.0.0/24 10.3.0.0/24 R1 172.16.0.0/30 R3 172.16.1.0/30 EIGRP AS 150 R2 10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 Fig. 5-4. EIGRP Summarization Referencing Figure 5-4, all routers reside in EIGRP autonomous system 150. R2 is advertising the 10.1.1.0/24, 10.1.2.0/24, and 10.1.3.0/24 subnets to R1 via EIGRP. R1, which also has an interface assigned to the 10.1.0.0/24 subnet, should in turn advertise these subnets to R3. The EIGRP configuration on router R2 has been implemented as follows: R2(config)#router eigrp 150 R2(config-router)#network 10.1.1.0 0.0.0.255 R2(config-router)#network 10.1.2.0 0.0.0.255 R2(config-router)#network 10.1.3.0 0.0.0.255 R2(config-router)#network 172.16.1.0 0.0.0.3 R2(config-router)#no auto-summary R2(config-router)#exit The EIGRP configuration on R1 has been implemented as follows: R1(config)#router eigrp 150 R1(config-router)#network 10.1.0.0 0.0.0.255 R1(config-router)#network 172.16.0.0 0.0.0.3 R1(config-router)#network 172.16.1.0 0.0.0.3 R1(config-router)#exit 223 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Finally, the EIGRP configuration on R3 has been implemented as follows: R3(config)#router eigrp 150 R3(config-router)#network 172.16.0.0 0.0.0.3 R3(config-router)#no auto-summary R3(config-router)#exit After this configuration, the routing table on R2 displays the following entries: R2#show ip route eigrp 172.16.0.0/30 is subnetted, 2 subnets D 172.16.0.0 [90/2172416] via 172.16.1.1, 00:02:38, FastEthernet0/0 10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks D 10.0.0.0/8 [90/156160] via 172.16.1.1, 00:00:36, FastEthernet0/0 The routing table on R1 displays the following entries: R1#show ip route eigrp 172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks D 172.16.0.0/16 is a summary, 00:01:01, Null0 10.0.0.0/8 is variably subnetted, 6 subnets, 2 masks D 10.1.3.0/24 [90/156160] via 172.16.1.2, 00:21:01, FastEthernet0/0 D 10.3.0.0/24 [90/2297856] via 172.16.0.2, 00:00:39, Serial0/0 D 10.1.2.0/24 [90/156160] via 172.16.1.2, 00:21:01, FastEthernet0/0 D 10.1.1.0/24 [90/156160] via 172.16.1.2, 00:21:01, FastEthernet0/0 D 10.0.0.0/8 is a summary, 00:01:01, Null0 Finally, the routing table on R3 displays the following entries: R3#show ip route eigrp 172.16.0.0/30 is subnetted, 2 subnets D 172.16.1.0 [90/2172416] via 172.16.0.1, 00:21:21, Serial0/0 10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks D 10.0.0.0/8 [90/2297856] via 172.16.0.1, 00:01:15, Serial0/0 Because summarization is enabled on R1, it appears that the EIGRP is no longer advertising the specific subnets encompassed by the 10.0.0.0/8 summary. To allow the specific subnets to be advertised via EIGRP, automatic summarization should be disabled on R1 as illustrated below: R1(config)#router eigrp 150 R1(config-router)#no auto-summary R1(config-router)#exit After this, the routing table on R3 would display the following route entries: R3#show ip route eigrp 172.16.0.0/30 is subnetted, 2 subnets 224 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P D D D D D 172.16.1.0 [90/2172416] via 172.16.0.1, 00:00:09, Serial0/0 10.0.0.0/24 is subnetted, 5 subnets 10.1.3.0 [90/2300416] via 172.16.0.1, 00:00:09, Serial0/0 10.1.2.0 [90/2300416] via 172.16.0.1, 00:00:09, Serial0/0 10.1.1.0 [90/2300416] via 172.16.0.1, 00:00:09, Serial0/0 10.1.0.0 [90/2297856] via 172.16.0.1, 00:00:09, Serial0/0 The same would also be applicable to R2, which would now display the specific entries for the 10.1.0.0/24 and 10.3.0.0/24 subnets as follows: R2#show ip route eigrp 172.16.0.0/30 is subnetted, 2 subnets D 172.16.0.0 [90/2172416] via 172.16.1.1, 00:00:10, FastEthernet0/0 10.0.0.0/24 is subnetted, 5 subnets D 10.3.0.0 [90/2300416] via 172.16.1.1, 00:00:10, FastEthernet0/0 D 10.1.0.0 [90/156160] via 172.16.1.1, 00:00:10, FastEthernet0/0 TROUBLESHOOTING STUB ROUTING ISSUES Unless you have a software or hardware bug that actually prevents stub router functionality, the most common problems with stub routing are prefix advertisement. As stated previously, a stub router will advertise only connected and summary routes by default. This happens regardless of whether any other routes (e.g., static routes) are included in the configured network statement on the stub router. Consider the topology illustrated in Figure 5-5 below: 10.1.0.0/24 10.3.0.0/24 R1 172.16.0.0/30 172.16.1.0/30 R2 10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 192.168.1.0/24 via 10.1.1.1 192.168.2.0/24 via 10.1.1.1 192.168.3.0/24 via 10.1.1.1 Fig. 5-5. EIGRP Stub Routing 225 R3 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Referencing Figure 5-5, R2 has some directly connected interfaces of 10.1.1.0/24, 10.1.2.0/24, and 10.1.3.0/24, as well as some static routes: 192.168.1.0/24, 192.168.2.0/24, and 192.168.3.0/24. The router has been configured as an EIGRP stub router as follows: R2(config)#router eigrp 150 R2(config-router)#network 10.1.0.0 0.0.255.255 R2(config-router)#network 192.168.0.0 0.0.255.255 R2(config-router)#eigrp stub R2(config-router)#no auto-summary R2(config-router)#exit Following this configuration, R1 (the hub) displays the following entries in its routing table: R1#show ip route 10.0.0.0/24 D 10.1.3.0 D 10.3.0.0 D 10.1.2.0 D 10.1.1.0 eigrp is subnetted, 5 subnets [90/156160] via 172.16.1.2, 00:09:51, FastEthernet0/0 [90/2297856] via 172.16.0.2, 00:43:49, Serial0/0 [90/156160] via 172.16.1.2, 00:09:51, FastEthernet0/0 [90/156160] via 172.16.1.2, 00:09:51, FastEthernet0/0 To resolve this, the stub router must be reconfigured to advertise static routes as follows: R2(config)#router eigrp 150 R2(config-router)#eigrp stub connected static R2(config-router)#exit NOTE: If you omit the connected keyword, then only the static routes will be advertised. After this configuration change, the routing table of R1 displays the following: R1#show ip route eigrp 10.0.0.0/24 is subnetted, 5 subnets D 10.1.3.0 [90/156160] via 172.16.1.2, 00:00:07, FastEthernet0/0 D 10.3.0.0 [90/2297856] via 172.16.0.2, 00:47:04, Serial0/0 D 10.1.2.0 [90/156160] via 172.16.1.2, 00:00:07, FastEthernet0/0 D 10.1.1.0 [90/156160] via 172.16.1.2, 00:00:07, FastEthernet0/0 D 192.168.1.0/24 [90/28160] via 172.16.1.2, 00:00:07, FastEthernet0/0 D 192.168.2.0/24 [90/28160] via 172.16.1.2, 00:00:07, FastEthernet0/0 D 192.168.3.0/24 [90/28160] via 172.16.1.2, 00:00:07, FastEthernet0/0 226 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P TROUBLESHOOTING SIA ISSUES Within the EIGRP Topology Table, entries may be marked either as Passive (P) or as Active (A). When a route is in the Passive state, it means that EIGRP has completed actively computing the metric for the route and traffic can be forwarded to the destination network using the Successor. This is the preferred state for all routes in the Topology Table. EIGRP routes are in an Active state when the Successor has been lost and the router sends out a Query packet to determine a Feasible Successor. Usually, a Feasible Successor is present and EIGRP promotes that to the Successor. This way, the router converges without involving other routers in the network. This process is referred to as a local computation. However, if the Successor has been lost or removed, and there is no Feasible Successor, the router will begin diffused computation. In diffused computation, EIGRP will send a Query packet out to all neighbors and out of all interfaces, except for the interface to the Successor. When an EIGRP neighbor receives a Query packet for a route and if that neighbor’s EIGRP Topology Table does not contain an entry for the route, then the neighbor immediately replies to the Query packet with an unreachable message, stating that there is no path for this route through this neighbor. If the EIGRP Topology Table on the neighbor lists the router sending the Query packet as the Successor for that route, and a Feasible Successor exists, then the Feasible Successor is installed and the router replies to the neighbor Query packet advising that it has a route to the lost destination network. However, if the EIGRP Topology Table lists the router sending the Query packet as the Successor for this route and there is no Feasible Successor, then the router queries all of its EIGRP neighbors, except those sent out of the same interface as its former Successor. The router will not reply to the Query packet until it has received a Reply to all Queries packet that it originated for this route. Finally, if the Query packet was received from a neighbor that is not the Successor for this destination, then the router replies with its own Successor information. If the neighboring routers do not have the lost route information, Query packets are sent from those neighboring routers to their neighboring routers until the Query boundary is reached. The Query boundary is either the end of the network, the distribute list boundary, or the summarization boundary. Once the Query packet has been sent, the EIGRP router must wait for all replies to be received before it calculates the Successor. If any neighbor has not replied within three minutes, the route is said to be Stuck-in-Active (SIA). When a route is SIA, the neighbor relationship of the router that did not respond to the Query packet will be reset. In such cases, you will see a message similar to the following logged by the router or switch: 227 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I %DUAL-5-NBRCHANGE: IP-EIGRP 150: Neighbor 150.1.1.1(Serial0/0) is down: stuck in active %DUAL-3-SIA: Route 172.16.100.0/24 stuck-in-active state in IP-EIGRP 150. Cleaning up There are several reasons why the EIGRP neighbor router(s) may not respond to the Query. Some of these reasons include the following: • The neighbor router’s CPU is overloaded and it cannot respond in time • The neighbor router itself has no information about the lost route • The neighbor cannot allocate memory to process the Query packet or build the Reply packet • Low bandwidth links are congested and packets are being delayed • Query packets are not received (i.e., this may be due to a bad circuit or unidirectional link) Troubleshooting SIA issues is a difficult task, especially in larger networks. Whenever you troubleshoot SIA errors, you should answer the following two questions, listed in order of urgency, to identify the possible causes of the SIA: 1. Why is the route active? 2. Why is the route stuck? The first question is a difficult one to answer because you have a window of about three minutes to determine why the route is active (i.e., why the router did not receive a response to the query that it sent out and which router(s) did not respond to the query). Fortunately, Cisco IOS software provides a powerful tool in the show ip eigrp topology active command. This command shows routes that are currently active, how long those routes have been active, and which EIGRP neighbors have and have not responded to queries. For the second question, after you have identified the router or switch that did not respond to the query, you can then access that device and determine why it did not respond. Keep in mind that it may be that it was also waiting for another device. Repeat this process until you determine the root cause (i.e., the device that is not responding to queries) and troubleshoot that device to determine why, keeping in mind the common causes listed in the previous section. To prevent SIA issues due to delayed responses from other EIGRP neighbors, the local router can be configured to wait for longer than the default of three minutes to receive responses back to its Query packets using the timers active-time command in router configuration mode. However, keep in mind that this duration is sufficient for most networks; therefore, it is important to identify 228 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P and address what is causing SIA issues versus masking the problem by increasing the EIGRP routing wait time. TROUBLESHOOTING ROUTE REDISTRIBUTION ISSUES In the next section of this chapter, we will look at how to troubleshoot basic route redistribution issues when using EIGRP. In order to troubleshoot redistribution issues effectively, you must have a solid understanding of how redistribution into EIGRP works. By default, when external routing information is redistributed into EIGRP, the external routes are assigned a default metric of infinity. The only three exceptions to this rule are as follows: 1. When redistributing between two EIGRP autonomous systems 2. When redistributing static routes into EIGRP 3. When redistributing connected interfaces (subnets) into EIGRP EIGRP preserves all metrics when redistributing between the two EIGRP autonomous systems. When connected subnets (interfaces) and static routes are redistributed into EIGRP, the redistributed routes are assigned a default external metric value of 0. When redistributing any other external route source, such as from OSPF, for example, a metric must be specified manually for those external routes. In Cisco IOS software, this can be performed using one of three methods as follows: 1. Specifying the seed (default) EIGRP metric for redistributed routes 2. Indirectly specifying the EIGRP metric during redistribution using a route map 3. Directly specifying the EIGRP metric during route redistribution The seed metric is the metric value that will be assigned to all the redistributed routes. In other words, this is the initial metric that will be assigned to the external routing information when redistribution into EIGRP is configured. The seed metric is configured using the default-metric router configuration command. Within a route-map, the set metric command can be used to specify the redistribution metric that will be used for the matched subnet(s). Finally, you can also specify the route metric during redistribution using the redistribute <protocol> metric configuration command. Following redistribution, all EIGRP external routes are assigned an administrative distance of 170 and are printed as D EX routes in the routing table. If you have configured redistribution into EIGRP and do not see any external routes, check the following while troubleshooting: • Verify that the routes are not incorrectly filtered • Verify that the metric has been specified 229 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I When redistributing, ensure that filters that are used during redistribution (e.g., route maps) are configured in the correct manner and that all networks that should be redistributed are permitted in the configurations. Because redistribution configurations can quickly become very complex, especially when redistributing at multiple points of the network, it is very important to check and then double-check your configuration prior to and following implementation. Finally, do not forget that EIGRP requires a metric to be specified when redistributing any routes other than EIGRP, static and connected into EIGRP. DEBUGGING EIGRP ROUTING ISSUES While primary emphasis has been placed on the use of show commands in the previous sections, this final section describes some of the debugging commands that can also be used to troubleshoot EIGRP. Keep in mind, however, that debugging is very processor intensive and should be used only as a last resort (i.e., after all show commands and other troubleshooting methods and tools have been applied or attempted). The debug ip routing [acl|static] command is a powerful troubleshooting tool and command. It should be noted, however, that while this command is not EIGRP-specific, it provides useful and detailed information on routing table events. Following is a sample of the information that is printed by this command: R1#debug ip routing IP routing debugging is on R1# *Mar 3 23:03:35.673: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down *Mar 3 23:03:35.673: has_route: True *Mar 3 23:03:35.677: *Mar 3 23:03:35.677: *Mar 3 23:03:35.677: *Mar 3 23:03:35.677: *Mar 3 23:03:35.677: *Mar 3 23:03:35.689: FastEthernet0/0 *Mar 3 23:03:35.689: *Mar 3 23:03:35.689: *Mar 3 23:03:35.689: FastEthernet0/0 *Mar 3 23:03:35.689: *Mar 3 23:03:35.689: *Mar 3 23:03:35.689: FastEthernet0/0 *Mar 3 23:03:35.689: RT: is_up: FastEthernet0/0 0 state: 4 sub state: 1 line: 0 RT: RT: RT: RT: RT: RT: interface FastEthernet0/0 removed from routing table del 172.16.1.0/30 via 0.0.0.0, connected metric [0/0] delete subnet route to 172.16.1.0/30 NET-RED 172.16.1.0/30 Pruning routes for FastEthernet0/0 (3) delete route to 10.1.3.0 via 172.16.1.2, RT: no routes to 10.1.3.0, flushing RT: NET-RED 10.1.3.0/24 RT: delete route to 10.1.2.0 via 172.16.1.2, RT: no routes to 10.1.2.0, flushing RT: NET-RED 10.1.2.0/24 RT: delete route to 10.1.1.0 via 172.16.1.2, RT: no routes to 10.1.1.0, flushing 230 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P *Mar 3 23:03:35.693: RT: NET-RED 10.1.1.0/24 *Mar 3 23:03:35.693: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 150: Neighbor 172.16.1.2 (FastEthernet0/0) is down: interface down *Mar 3 23:03:39.599: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 150: Neighbor 172.16.1.2 (FastEthernet0/0) is up: new adjacency *Mar 3 23:03:40.601: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to up *Mar 3 23:03:40.601: RT: is_up: FastEthernet0/0 1 state: 4 sub state: 1 line: 1 has_route: False *Mar 3 23:03:40.605: RT: SET_LAST_RDB for 172.16.1.0/30 NEW rdb: is directly connected *Mar 3 23:03:40.605: RT: *Mar 3 23:03:40.605: RT: *Mar 3 23:03:40.605: RT: *Mar 3 23:03:49.119: RT: NEW rdb: via 172.16.1.2 *Mar 3 23:03:49.119: RT: [90/156160] add 172.16.1.0/30 via 0.0.0.0, connected metric [0/0] NET-RED 172.16.1.0/30 interface FastEthernet0/0 added to routing table SET_LAST_RDB for 10.1.1.0/24 add 10.1.1.0/24 via 172.16.1.2, eigrp metric You can use this command in conjunction with an ACL to view information about the route or routes referenced in the ACL. Additionally, the same command can also be used for troubleshooting static route events on the local device. As a side note, instead of using this command, if you are running EIGRP, consider using the show ip eigrp events command instead, as it provides a history of EIGRP internal events and can be used to troubleshoot SIA issues, as well as route flaps and other events. Following is a sample of the information that is printed by this command: R1#show ip eigrp events Event information for AS 150: 1 23:03:49.135 Ignored route, metric: 192.168.3.0 28160 2 23:03:49.135 Ignored route, metric: 192.168.2.0 28160 3 23:03:49.135 Ignored route, metric: 192.168.1.0 28160 4 23:03:49.131 Rcv EOT update src/seq: 172.16.1.2 85 5 23:03:49.127 Change queue emptied, entries: 3 6 23:03:49.127 Ignored route, metric: 192.168.3.0 28160 7 23:03:49.127 Ignored route, metric: 192.168.2.0 28160 8 23:03:49.127 Ignored route, metric: 192.168.1.0 28160 9 23:03:49.127 Metric set: 10.1.3.0/24 156160 10 23:03:49.127 Update reason, delay: new if 4294967295 11 23:03:49.127 Update sent, RD: 10.1.3.0/24 4294967295 12 23:03:49.127 Update reason, delay: metric chg 4294967295 13 23:03:49.127 Update sent, RD: 10.1.3.0/24 4294967295 14 23:03:49.123 Route install: 10.1.3.0/24 172.16.1.2 15 23:03:49.123 Find FS: 10.1.3.0/24 4294967295 16 23:03:49.123 Rcv update met/succmet: 156160 128256 17 23:03:49.123 Rcv update dest/nh: 10.1.3.0/24 172.16.1.2 231 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 18 19 20 21 22 23:03:49.123 23:03:49.123 23:03:49.123 23:03:49.123 23:03:49.123 Metric Metric Update Update Update set: 10.1.3.0/24 4294967295 set: 10.1.2.0/24 156160 reason, delay: new if 4294967295 sent, RD: 10.1.2.0/24 4294967295 reason, delay: metric chg 4294967295 ... [Truncated Output] In addition to the debug ip routing command, two additional EIGRP-specific debugging commands are also available in Cisco IOS software. The debug eigrp command can be used to provide real-time information on the DUAL Finite State Machine, EIGRP neighbor relationships, Non-Stop Forwarding events, packets, and transmission events. The options that are available with this command are illustrated below: R1#debug eigrp ? fsm EIGRP neighbors EIGRP nsf EIGRP packets EIGRP transmit EIGRP Dual Finite State Machine events/actions neighbors Non-Stop Forwarding events/actions packets transmission events In addition to the debug eigrp command, the debug ip eigrp command prints detailed information on EIGRP route events, such as how EIGRP processes incoming updates. The additional keywords that can be used in conjunction with this command are illustrated below: R1#debug ip eigrp ? <1-65535> Autonomous System neighbor IP-EIGRP neighbor debugging notifications IP-EIGRP event notifications summary IP-EIGRP summary route processing vrf Select a VPN Routing/Forwarding instance <cr> In conclusion, following is a sample output of the debug ip eigrp command: R1#debug ip eigrp IP-EIGRP Route Events debugging is on R1# *Mar 3 23:49:47.028: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 150: Neighbor 172.16.1.2 (FastEthernet0/0) is up: new adjacency *Mar 3 23:49:47.044: IP-EIGRP(Default-IP-Routing-Table:150): 10.1.0.0/24 - do advertise out FastEthernet0/0 *Mar 3 23:49:47.044: IP-EIGRP(Default-IP-Routing-Table:150): Int 10.1.0.0/24 metric 128256 - 256 128000 *Mar 3 23:49:48.030: %LINEPROTO-5-UPDOWN: Line protocol on Interface 232 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P FastEthernet0/0, changed state to up *Mar 3 23:49:56.179: IP-EIGRP(Default-IP-Routing-Table:150): incoming UPDATE packet *Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150): incoming UPDATE packet *Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150): 156160 - 25600 130560 SM 128256 - 256 128000 *Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150): for 10.1.1.0 () *Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150): 156160 - 25600 130560 SM 128256 - 256 128000 *Mar 3 23:49:56.548: IP-EIGRP(Default-IP-Routing-Table:150): for 10.1.2.0 () *Mar 3 23:49:56.548: IP-EIGRP(Default-IP-Routing-Table:150): 156160 - 25600 130560 SM 128256 - 256 128000 Processing Processing Int 10.1.1.0/24 M route installed Int 10.1.2.0/24 M route installed Int 10.1.3.0/24 M ... [Truncated Output] CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Enhanced Interior Gateway Protocol Overview • EIGRP is a Cisco-proprietary advanced Distance Vector routing protocol • EIGRP runs directly over IP using protocol number 88 • EIGRP uses several different types of packets, which include the following: 1. Hello Packets 2. Acknowledgement Packets 3. Update Packets 4. Query Packets 5. Reply Packets • EIGRP supports both dynamic and static (manually configured) neighbor discovery • Dynamically discovered neighbors use Multicast for communication • Statically defined neighbors use Unicast for communication • EIGRP uses different Hello and Hold timers for different types of media • Hello timers are used to determine the interval rate EIGRP Hello packets are sent • The Hold timer determines the time before a router considers a neighbor down • By default, the Hold time is three (3) times the Hello interval • The EIGRP Neighbor Table is used to maintain state information about EIGRP neighbors • RTP is used to ensure that Update, Query and Reply packets are sent reliably 233 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • • • • • • • • EIGRP uses a composite metric, which includes different variables, called K values The default values for the K values are K1 = K3 = 1 and K2 = K4 = K5 = 0 The Diffusing Update Algorithm is at the crux of the EIGRP routing protocol The best route, which is the route with the lowest Feasible Distance, is the Successor route Feasible successor routes must first meet the Feasibility Condition The Topology Table allows all EIGRP routers to have a consistent view of the entire network The stub routing feature is most commonly used in hub-and-spoke networks EIGRP supports the following two types of route summarization: 1. Automatic route summarization 2. Manual route summarization Troubleshoo ng Neighbor Rela onships • The EIGRP neighbor relationship may not establish due to any of the following: 1. The neighbor routers are not on a common subnet 2. Mismatched primary and secondary subnets 3. Mismatched K Values 4. Mismatched AS Number 5. Access Control Lists are filtering EIGRP packets 6. Physical Layer Issues 7. Data Link Layer Issues 8. Mismatched Authentication Parameters Troubleshoo ng Route Installa on • Some common reasons for route installation failure include the following: 1. The same route is received via another protocol with a lower AD 2. EIGRP summarization 3. Duplicate Router IDs are present within the EIGRP domain 4. The routes do not meet the Feasibility Condition Troubleshoo ng Route Adver sement • There are several reasons why EIGRP might not advertise a network, including the following: 1. Distribute Lists 2. Split Horizon 3. Summarization Troubleshoo ng Stub Rou ng Issues • • • Stub routing issues are typically due to basic router misconfigurations Ensure that you understand what needs to be advertised By default, stub routers advertise only connected and summary routes 234 C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P Troubleshoo ng SIA Issues • Within the EIGRP Topology Table, entries may either be marked as Passive or as Active • Routes go Active when the successor is lost and a query is sent to find a feasible successor • If a response to a query is not received in three minutes, the route becomes SIA • Some of these reasons include the following: 1. The neighbor routers CPU is overloaded and it cannot respond in time 2. The neighbor router itself has no information about the lost route 3. The neighbor cannot allocate memory to process the query or build the reply packet 4. Low bandwidth links are congested and packets are being delayed 5. Queries are not received, i.e. this may due to a bad circuit or unidirectional link • Whenever you troubleshoot SIA errors, answer the following two questions to troubleshoot: 1. Why is the route active? 2. Why is the route stuck? • Use the show ip eigrp topology active command to troubleshoot SIA issues • Use the timers active-time command to increase route waiting time Troubleshoo ng Route Redistribu on Issues • By default, external routes redistributed into EIGRP are assigned a default metric of infinity • The only three exceptions to this rule are as follows: 1. When redistributing between two EIGRP autonomous systems 2. When redistributing static routes into EIGRP 3. When redistributing connected interfaces (subnets) into EIGRP • If routes are not being redistributed into EIGRP, check the following while troubleshooting 1. Verify that the routes are not incorrectly filtered 2. Verify that the metric has been specified 235 CHAPTER 6 Troubleshoo ng OSPF C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I O pen Shortest Path First is an open-standard Link State routing protocol. Link State routing protocols advertise the state of their links. When a Link State router begins operating on a net- work link, information associated with that logical network is added to its local Link State Database (LSDB). The local router then sends Hello messages on its operational links to determine whether other Link State routers are operating on the interfaces as well. OSPF runs directly over Internet Protocol using IP protocol number 89. The TSHOOT certification exam objective that is covered in this chapter is as follows: • Troubleshoot OSPF While it is not possible to delve into all potential OSPF problem scenarios, this chapter discusses some of the most common problem scenarios when OSPF is implemented as the IGP of choice. Prior to delving into specific OSPF troubleshooting scenarios, this chapter begins with an overview of the OSPF routing protocol, providing a fundamental understanding of the routing protocol that will facilitate troubleshooting OSPF networks. This chapter is divided into the following sections: • Open Short Path First Protocol Overview • Troubleshooting Neighbor Relationships • Troubleshooting Route Advertisement • Troubleshooting Route Redistribution Issues • Troubleshooting Route Summarization • Debugging OSPF Routing Issues OPEN SHORT PATH FIRST PROTOCOL OVERVIEW The following sections provide a brief recap of the OSPF protocol specification and operation. It should be noted that detailed information on OSPF will not be included in the following sections. Instead, emphasis is placed only on the core concepts that you must understand in order to troubleshoot and support OSPF networks effectively. Additional details and information can be found in the current ROUTE study guide, which is available online. OSPF Fundamentals When OSPF is enabled in Cisco IOS software, operational data, configured parameters, and statistics information is stored in the four separate data structures illustrated in Figure 6-1 below: 238 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F Interface Table Neighbor Table Routing Information Base Link State Database Fig. 6-1. OSPF Data Structures The Interface Table provides a list of all interfaces that have been enabled for OSPF. You can view the state of the interfaces in the Interface Table by using the show ip ospf interfaces command. This command also includes additional information, such as the network type, Hello and Dead timers, adjacent neighbors, and any configured authentication parameters, assuming that OSPF interface authentication has been configured. The Neighbor Table tracks all active OSPF neighbors. The contents of this data structure can be viewed using the show ip ospf neighbor command. This command allows you to view information about all neighbors for all processes or to view detailed information on individual neighbors, which includes, but is not limited to, neighbor uptime and area designation. The Link State Database (LSDB) contains information about the network topology. The LSDB is a collection of Link State Advertisements (LSAs) for all routers and networks. Each router in the OSPF network maintains an identical database, which ensures that all routers in the domain have a consistent view of the overall network topology. By default, the Link State Database is refreshed every 30 minutes; however, LSA flooding occurs whenever there is a change in the OSPF topology, ensuring that the databases are synchronized. The contents of the LSDB can be viewed using the show ip ospf database command. This command, when used without any additional keywords, prints information on all LSAs in the LSDB. However, Cisco IOS software allows administrators to view detailed information on a per-LSA basis using additional keywords, such as external, to view Type 5 LSAs, for example. Additionally, you can also use the show ip ospf database database-summary command to view how many of each type of LSA for each area there are in the database, and the total number of each. Following is a sample output of this command: R1#show ip ospf database database-summary OSPF Router with ID (10.1.0.1) (Process ID 1) 239 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Area 0 database summary LSA Type Count Delete Maxage Router 2 0 0 Network 0 0 0 Summary Net 2 0 0 Summary ASBR 0 0 0 Type-7 Ext 0 0 0 Prefixes redistributed in Type-7 0 Opaque Link 0 0 0 Opaque Area 0 0 0 Subtotal 4 0 0 Process 1 database summary LSA Type Count Delete Maxage Router 2 0 0 Network 0 0 0 Summary Net 2 0 0 Summary ASBR 0 0 0 Type-7 Ext 0 0 0 Opaque Link 0 0 0 Opaque Area 0 0 0 Type-5 Ext 1 0 0 Prefixes redistributed in Type-5 1 Opaque AS 0 0 0 Total 5 0 0 Finally, the Routing Information Base (RIB) contains the results derived from the Shortest Path First calculation, the contents of which can be verified or viewed using the show ip route [ospf] [process ID] command. For example, to view all routes learned by OSPF process ID 1, you would enter the following command on the router: R1#show ip route ospf 1 10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks O IA 10.3.0.3/32 [110/65] via 172.16.0.2, 00:51:12, Serial0/0 150.1.0.0/24 is subnetted, 1 subnets O IA 150.1.1.0 [110/74] via 172.16.0.2, 00:51:12, Serial0/0 Mul -Area OSPF Fundamentals OSPF is a hierarchical routing protocol that logically divides the network into sub-domains that are referred to as areas. This logical segmentation is used to limit the scope of Link State Advertisement (LSA) flooding throughout the OSPF domain. LSAs are special types of packets sent by routers running OSPF. Different types of LSAs are used within an area and between areas. By restricting the propagation of certain types of LSAs between areas, the OSPF hierarchical implementation effectively reduces the amount of routing protocol traffic within the OSPF network. In a multi-area OSPF network, one area must be designated as the backbone area. The backbone is the logical center of the OSPF network. All other non-backbone areas must be physically con- 240 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F nected to the backbone. However, because it is not always possible or feasible to have a physical connection between a non-backbone area and the backbone, the OSPF standard allows the use of virtual connections to the backbone, called virtual links. Routers within each area store detailed topology information for the area in which they reside. Within each area, one or more routers, referred to as Area Border Routers (ABRs), facilitate inter-area routing by advertising summarized routing information between the different areas. This functionality allows for the following within the OSPF network: • Reduces the scope of LSA flooding throughout the OSPF domain • Hides detailed topology information between areas • Allows for end-to-end connectivity within the OSPF domain • Creates logical boundaries within the OSPF domain The backbone receives summarized routing information from ABRs. Routing information is then disseminated to all other non-backbone areas within the OSPF network. When a change to the topology occurs, this information is disseminated throughout the entire OSPF domain, allowing all routers in all areas to have a consistent view of the network. ABRs maintain Link State Database (LSDB) information for all the areas to which they are connected by running the SPF algorithm for each area to which they belong and generating Type 3 LSAs for these areas. All routers within each area have detailed topology information pertaining to that specific area. The internal routers exchange intra-area routing information and use the Type 3 LSA information advertised by the ABRs to build a view of the topology outside the local area. OSPF Network Types OSPF uses the following default network types for different media: • Non-Broadcast • Point-to-point • Broadcast • Point-to multipoint Non-Broadcast networks are network types that do not support natively Broadcast or Multicast traffic. The most common example of a non-Broadcast network type is Frame Relay. Non-Broadcast network types require additional configuration to allow for both Broadcast and Multicast support. On such networks, OSPF elects a Designated Router (DR) and/or a Backup Designated Router (BDR). These two routers are described later in this chapter. 241 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I OSPF-enabled routers send Hello packets every 30 seconds on non-Broadcast network types. If a Hello packet is not received in four times the Hello interval, or 120 seconds, the neighbor router is considered ‘dead.’ A point-to-point connection is simply a connection between only two endpoints. Examples of point-to-point connections include physical WAN interfaces using HDLC and PPP encapsulation, and Frame Relay and ATM point-to-point subinterfaces. No DR or BDR is elected on OSPF pointto-point network types. By default, OSPF sends Hello packets every 10 seconds on point-to-point network types. Broadcast network types are those that natively support Broadcast and Multicast traffic, the most common example being Ethernet. As is the case with non-Broadcast networks, OSPF also elects a DR and/or a BDR on Broadcast networks. By default, OSPF sends Hello packets every 10 seconds on these network types, and a neighbor is declared ‘dead’ if no Hello packets are received within four times the Hello interval, which is 40 seconds. The point-to-multipoint network type is a non-default OSPF network type. In other words, this network type must be configured manually using the ip ospf network point-to-multipoint [non-broadcast] interface configuration command. By default, the command defaults to a Broad- cast point-to-multipoint network type. This default network type allows OSPF to use Multicast packets to discover neighbor routers dynamically. In addition, there is no DR/BDR election held on Broadcast point-to-multipoint network types. The [non-broadcast] keyword configures the point-to-multipoint network type as a non-Broadcast point-to-multipoint network. This requires static OSPF neighbor configuration, as OSPF will not use Multicast to discover neighbor routers dynamically. This network type does not require the election of a DR and/or a BDR router for the designated segment. The primary use of this network type is to allow neighbor costs to be assigned to neighbors instead of using the interface-assigned cost for routes received from all neighbors. The point-to-multipoint network type is typically used in partial-mesh hub-and-spoke Non-Broadcast Multiple Access (NBMA) networks. However, it should also be noted that this network type could also be specified for other network types, such as Broadcast Multi-Access networks (e.g., Ethernet). By default, OSPF sends Hello packets every 30 seconds on point-to-multipoint networks. The default dead interval is four times the Hello interval, or 120 seconds. 242 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F OSPF Designated and Backup Designated Routers OSPF elects a Designated Router (DR) and/or a Backup Designated Router (BDR) on Broadcast and non-Broadcast network types. It is important to understand that the BDR is not a mandatory component on these network types. In fact, OSPF will work just as well if only a DR is elected and there is no BDR; however, there will be no redundancy if the DR fails, and the OSPF routers will need to go through the election process again to elect a new DR. On the segment, each individual non-DR/BDR router establishes an adjacency with the DR and, if one has also been elected, the BDR, but not with any other non-DR/BDR routers on the segment. The DR and BDR routers are fully adjacent with each other and all other routers on the segment. Non-DR/BDR routers never complete the database exchange and never reach the Full adjacency state with any other non-DR/BDR router on the segment. The non-DR/BDR routers send messages and updates to the AllDRRouters Multicast group address 224.0.0.6. Only the DR/BDR routers listen to Multicast messages sent to this group address. The DR then advertises messages to the AllSPFRouters Multicast group address 224.0.0.5. This allows all other OSPF routers on the segment to receive the updates. In order for a router to be the Designated Router, or the Backup Designated Router, for the segment, the router must be elected. This election is based on the following: • The highest router priority value • The highest router ID By default, all routers have a default priority value of 1. This value can be adjusted using the ip ospf priority <0-255> interface configuration command. The higher the priority, the greater the like- lihood the router will be elected DR for the segment. The router with the second highest priority will then be elected BDR. If a priority value of 0 is configured, the router will not participate in the DR/BDR election process. When determining the OSPF router ID (RID), Cisco IOS selects the highest IP address of configured Loopback interfaces. If no Loopback interfaces are configured, the software uses the highest IP address of all configured physical interfaces as the OSPF RID. Cisco IOS software also allows administrators to specify the RID manually using the router-id [address] router configuration command. It is important to remember that with OSPF, once the DR and the BDR have been elected, they will remain as DR/BDR routers until a new election is held. This will never change, even if a router with a higher RID or priority is introduced to the segment. In order for the DR/BDR to be changed, the 243 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I current DR/BDR routers must fail or must be removed, or the administrator can reset the routers’ OSPF processes manually using the clear ip ospf process command. Establishing Adjacencies Routers running OSPF transition through several states before establishing an adjacency. The routers exchange different types of packets during these states. This exchange of messages allows all routers that establish an adjacency to have a consistent view of the network. Additional changes to the current network are simply sent out as incremental updates. These different states are the Down, Attempt, Init, 2-Way, Exstart, Exchange, Loading, and Full states. • The Down state is the starting state for all OSPF routers. However, the local router may also show a neighbor in this state when no Hello packets have been received within the specified router dead interval for that interface. • The Attempt state is valid only for OSPF neighbors on NBMA networks. This state means that a Hello has been sent but no information has been received from the statically configured neighbor; however, some effort is being made to establish an adjacency with this neighbor. • The Init state is reached when an OSPF router receives a Hello packet from a neighbor but the local RID is not listed in the received Neighbor field. If OSPF Hello parameters, such as timer values, do not match, OSPF routers will never progress beyond this state. • The 2-Way state indicates bidirectional communication with the OSPF neighbor(s). This means that the local router has received a Hello packet with its own RID in the Neighbor field and Hello packet parameters are identical on the two routers. On multi-access networks, the DR and BDR routers are elected during this phase. • The Exstart state is used for the initialization of the database synchronization process. It is at this stage that the local router and its neighbor establish which router is in charge of the database synchronization process. The Master and Slave are elected in this state, and the first sequence number for Database Descriptor (DBD) packet exchange is decided by the Master in this stage. • The Exchange state is where routers describe the contents of their databases using DBD packets. Each DBD packet sequence is explicitly acknowledged, and only one outstanding DBD packet is allowed at a time. During this phase, Link State Record (LSR) packets are also sent to request a new instance of the LSA. The M (More) bit is used to request missing information during this stage. When both routers have exchanged their complete databases, they will both set the M bit to 0. 244 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F • In the Loading state, OSPF routers build an LSR packet and a Link State Retransmission list. LSR packets are sent to request the more recent instance of an LSA that has not been received during the Exchange process. Updates that are sent during this phase are placed on the Link State Retransmission list until the local router receives an acknowledgement. If the local router also receives an LSR packet during this phase, it will respond with a Link State Update packet that contains the requested information. • The Full state indicates that the OSPF neighbors have exchanged their entire databases and both agree (i.e., have the same view of the network). Both neighboring routers in this state add the adjacency to their local database and advertise the relationship in a Link State Update packet. At this point, the routing tables are calculated, or recalculated if the adjacency was reset. In order for an OSPF adjacency to be established successfully, certain parameters on both routers must match. These parameters include the following: • The interface MTU values • The Hello and Dead timers • The area ID • The authentication type and password • The stub area flag • IP subnet and subnet mask OSPF LSAs and the Link State Database (LSDB) As stated in the previous section, OSPF uses several types of Link State Advertisements. Each LSA begins with a standard 20-byte LSA header, which contains the following: • Link State Age • Options • Link State Type • Link State ID • Advertising Router • Link State Sequence Number • Link State Checksum • Length NOTE: These fields are described in detail in the ROUTE study guide that is available online. Please refer to that guide for additional information on each of these fields. 245 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I While OSPF supports 11 different types of Link State Advertisements, only LSAs Type 1, 2 and, 3, which are used to calculate internal routes, and LSAs Type 4, 5, and 7, which are used to calculate external routes, are within the scope of the TSHOOT certification exam. These LSAs are described in the sections that follow: • Type 1 (Router) LSAs are generated by each router for each area it belongs to. The Type 1 LSA lists the originating router’s RID. Each individual router will generate a Type 1 LSA for the area in which it resides. For each Type 1 LSA, an ABR will both generate and advertise a Type 3 LSA. In other words, there is a one-to-one correlation between a Type 1 and Type 3 LSA. Type 1 LSAs are flooded within a single area and all routers within the area receive these LSAs from all other routers in the same area. • OSPF uses the Network Link State Advertisement (Type 2 LSA) to advertise the routers on the multi-access segment. This LSA is generated by the DR and is flooded only within the area. Because the other non-DR/BDR routers do not establish adjacencies with each other, the Network LSA allows those routers to know about the other routers on the multi-access segment. As is the case with a Type 1 LSA, the ABR will also generate and advertise a Type 3 LSA for each Type 2 LSA. • The Network (Type 3) LSA is a summary of destinations outside of the local area, but within the OSPF domain. In other words, this LSA advertises inter-area routing information. The Network LSA does not carry any topological information. Instead, the only information contained in the LSA is an IP prefix. Type 3 LSAs are generated by ABRs and are flooded to all adjacent areas. Each Type 3 LSA matches a single Router or Network LSA on a one-for-one basis. In other words, a Type 3 LSA exists for each individual Type 1 and Type 2 LSAs. Type 3 LSAs are advertised from a non-backbone area to the OSPF backbone for intra-area routes (i.e., for Type 1 and Type 2 LSAs), and they are advertised from the OSPF backbone to other non-backbone areas for both intra-area (i.e., Area 0) Type 1 and Type 2 LSAs and inter-area routes (i.e., for Type 3 LSAs) flooded into the backbone by other ABRs. • The Type 4 LSA describes information about an Autonomous System Boundary Router (ASBR). This LSA contains the same packet format as the Type 3 LSA and performs the same basic functionality, with some notable differences. Like the Type 3 LSA, the Type 4 LSA is generated by the ABR. For both LSAs, the advertising router field contains the RID of the ABR that generated the Summary LSA. However, the Type 4 LSA is created by the ABR for each ASBR reachable by a Router LSA. The ABR then injects the Type 4 LSA into the appropriate area. This LSA provides reachability information on the ASBR itself. 246 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F • The External Link State Advertisement is used to describe destinations that are external to the autonomous system. In other words, Type 5 LSAs provide the network information necessary to reach the external networks. In addition to external routes, the default route for an OSPF routing domain can also be injected as a Type 5 Link State Advertisement. The External LSA has a domain-flooding scope. This means that the ABR no longer stops the flooding process but instead continues it into its respective areas. The only areas to which External LSAs are not flooded are any stub-type areas. Before Type 5 LSAs are installed into the routing table, the router calculating the Type 5 LSA must have a Type 4 LSA for the ASBR, and the router must know about the forwarding address contained in the Type 5 LSA. • The Type 7 LSA is used for external routing information from the ASBR in a not-so-stubby area (NSSA). The external routing information within the LSA is converted by the ABR into Type 5 LSA at the area boundary. The ABR then floods the Type 5 LSA into the OSPF domain, and other routers in the network are aware of the external networks. Type 7 LSAs have an area flooding scope, so only routers in the NSSA receive the Type 7 LSA. OSPF Areas In addition to the backbone (Area 0) and other non-backbone areas, the OSPF specification also defines several ‘special’ types of areas. The configuration of these areas is used primarily to reduce the size of the LSDB on routers residing within those areas by preventing the injection of different types of LSAs (primarily Type 5 LSAs) into certain areas, which include the following: • Not-so-stubby areas • Totally not-so-stubby areas • Stub areas • Totally stubby areas Not-so-stubby areas (NSSAs) are a type of OSPF stub area that allows the injection of external routing information by an ASBR using an NSSA External LSA (Type 7). The Type 7 LSA is used for external routing information from the ASBR within the NSSA. The external routing information within the LSA is converted by the ABR into a Type 5 LSA at the area boundary. The ABR then floods the Type 5 LSA into the OSPF domain, and other routers in the network are aware of the external networks. Type 7 LSAs have an area flooding scope, so only routers in the NSSA receive the Type 7 LSA. Totally not-so-stubby areas (TNSSAs) are an extension of NSSAs. Like NSSAs, Type 5 LSAs are not allowed into a TNSSA. However, unlike NSSAs, Summary LSAs are not allowed into a TNSSA. In addition, when a TNSSA is configured, the default route is injected into the area as a Type 7 LSA. TNSSAs have the following characteristics: 247 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Type 7 LSAs are converted into Type 5 LSAs at the NSSA ABR • They do not allow Network Summary LSAs • They do not allow External LSAs Stub areas are somewhat similar to NSSAs, with the major exception being that external routes (Type 5 or Type 7 LSAs) are not allowed into stub areas. It is important to understand that stub functionality in OSPF and EIGRP is not at all similar. In OSPF, the configuration of an area as a stub area reduces the size of the routing table and the OSPF database for the routers within the stub area by preventing External LSAs from being advertised into such areas without any further configuration. Stub areas have the following characteristics: • The default route is injected into the stub area by the ABR as a Type 3 LSA • Type 3 LSAs from other areas are permitted into these areas • External route LSAs (i.e., Type 4 and Type 5 LSAs) are not allowed • The default route is injected as a Summary LSA Totally stubby areas (TSAs) are an extension of stub areas. However, unlike stub areas, TSAs further reduce the size of the LSDB on routers in the TSA by restricting Type 3 LSAs, in addition to the External LSAs. TSAs are typically configured on routers that have a single ingress and egress point in to and out of the network, for example, in a traditional hub-and-spoke network. The area routers forward all external traffic to the ABR. The ABR is also the exit point for all backbone and inter-area traffic to the TSA. TSAs have the following characteristics: • The default route is injected into stub areas as a Type 3 Network Summary LSA • Type 3, 4, and 5 LSAs from other areas are not permitted into these areas OSPF Virtual Links A virtual link is a logical extension of the OSPF backbone. As we learned earlier in this chapter, when implementing a multi-area OSPF network, one area must be designated as the backbone area and all non-backbone areas must be connected to the backbone area. In most cases, a physical link is used to connect the non-backbone area to the backbone area; however, this is not always possible or feasible. In addition to being used to connect areas that have no physical connection to the OSPF backbone, virtual links can also be used for redundancy, as well as for connecting a discontinuous or partitioned backbone. 248 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F TROUBLESHOOTING NEIGHBOR RELATIONSHIPS Routers running OSPF transition through several states before establishing an adjacency. These different states are the Down, Attempt, Init, 2-Way, Exstart, Exchange, Loading, and Full states. The preferred state for an OSPF adjacency is the Full state. This state indicates that the neighbors have exchanged their entire databases and both have the same view of the network. While the Full state is the preferred adjacency state, it is possible that during the adjacency establishment process, the neighbors get ‘stuck’ in one of the other states. For this reason, it is important to understand what to look for in order to troubleshoot the issue. The following sections describe the following problems pertaining to OSPF neighbor relationships: • The Neighbor Table is empty • The Neighbor is stuck in the ATTEMPT state • The Neighbor is stuck in the INIT state • The Neighbor is stuck in the 2WAY state • The Neighbor is stuck in the EXSTART/EXCHANGE state • The Neighbor is stuck in the LOADING state The Neighbor Table is Empty There are several reasons why the OSPF Neighbor Table may be empty (i.e., why the output of the show ip ospf neighbor command might not yield any results). Common reasons are as follows: • Basic OSPF misconfigurations • Layer 1 and Layer 2 issues • ACL filtering • Interface misconfigurations Basic OSPF misconfigurations span a broad number of things. These could include mismatched timers, area IDs, authentication parameters, and stub configuration, for example. A plethora of tools is available in Cisco IOS software to troubleshoot basic OSPF misconfigurations. For example, you could use the show ip protocols command to determine information (e.g., about OSPFenabled networks); the show ip ospf command to determine area configuration and the interfaces per area; and the show ip ospf interface brief command to determine which interfaces reside in which area, and for which OSPF process IDs those interfaces have been enabled, assuming that OSPF has been enabled for the interface. Another common misconfiguration is specifying the interface as passive. If this is so, then the interface will not send out Hello packets, and a neighbor relationship will not be established using that interface. You can verify which interfaces have been configured or specified as passive using 249 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I either the show ip protocols or the show ip ospf interface commands. Following is a sample output of the latter command on a passive interface: R1#show ip ospf interface Serial0/0 Serial0/0 is up, line protocol is up Internet Address 172.16.0.1/30, Area 0 Process ID 1, Router ID 10.1.0.1, Network Type POINT_TO_POINT, Cost: 64 Transmit Delay is 1 sec, State POINT_TO_POINT Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5 oob-resync timeout 40 No Hellos (Passive interface) Supports Link-Local Signaling (LLS) Index 1/1, flood queue length 0 Next 0x0(0)/0x0(0) Last flood scan length is 0, maximum is 0 Last flood scan time is 0 msec, maximum is 0 msec Neighbor Count is 0, Adjacent neighbor count is 0 Suppress hello for 0 neighbor(s) Finally, when enabling OSPF over NBMA technologies such as Frame Relay, remember that the neighbors must be defined statically, as OSPF does not use Multicast transmission for neighbor discovery for the default non-Broadcast network type. This is a common reason for empty Neighbor Tables when implementing OSPF. Layer 1 and Layer 2 issues can also result in a formation of OSPF neighbor relationships. Layer 1 and Layer 2 troubleshooting was described in detail in the previous section. Use commands such as the show interfaces command to check for interface status (i.e., line protocol), as well as any received errors on the interface. If the OSPF-enabled routers reside in a VLAN that spans multiple switches, verify that there is end-to-end connectivity within the VLAN and that all ports or interfaces are in the correct Spanning Tree states, for example. ACL filtering is another common cause for adjacencies failing to establish. It is important to be familiar with the topology in order to troubleshoot such issues. For example, if the routers failing to establish an adjacency are connected via different physical switches, it may be that the ACL filtering is being implemented in the form of a VACL that has been configured on the switches for security purposes. A useful troubleshooting tool that may indicate that OSPF packets are being either blocked or discarded is the show ip ospf traffic command, which prints information on transmitted and sent OSPF packets as illustrated in the output below: R1#show ip ospf traffic Serial0/0 Interface Serial0/0 250 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F OSPF packets Invalid Rx: 0 Tx: 0 received/sent Hellos DB-des 0 0 6 0 LS-req 0 0 LS-upd 0 0 LS-ack 0 0 Total 0 6 OSPF header errors Length 0, Auth Type 0, Checksum 0, Version 0, Bad Source 0, No Virtual Link 0, Area Mismatch 0, No Sham Link 0, Self Originated 0, Duplicate ID 0, Hello 0, MTU Mismatch 0, Nbr Ignored 0, LLS 0, Unknown Neighbor 0, Authentication 0, TTL Check Fail 0, OSPF LSA errors Type 0, Length 0, Data 0, Checksum 0, In the output above, notice that the local router is sending OSPF Hello packets but is not receiving any. If the configuration on the routers is correct, check ACLs on the routers or intermediate devices to ensure that OSPF packets are not being filtered or discarded. Another common reason for an empty Neighbor Table is interface misconfigurations. Similar to EIGRP, OSPF will not establish a neighbor relationship using secondary interface addresses. However, unlike EIGRP, OSPF will also not establish a neighbor relationship if interface subnet masks are not consistent. EIGRP-enabled routers will establish neighbor relationships even if the interface subnet masks are different. For example, if two routers, one with an interface using the address 10.1.1.1/24 and another with an interface using the address 10.1.1.2/30 are configured in back-to-back EIGRP implementation, they will successfully establish a neighbor relationship. However, it should be noted that such implementations could cause routing loops between the routers. Because such implementations fall outside of the range of the TSHOOT exam requirements, they will not be described in any further detail in this chapter. In addition to mismatched subnet masks, EIGRP-enabled routers also ignore Maximum Transmission Unit (MTU) configurations and establish neighbor relationships even if the interface MTU values are different. Use the show ip interfaces and show interfaces command to verify IP address and mask configuration. The Neighbor is Stuck in the ATTEMPT State The ATTEMPT state is valid only for OSPF neighbors on NBMA networks. This state means that a Hello has been sent but no information has been received from the statically configured neighbor; however, some effort is being made to establish an adjacency with this neighbor. Several possible reasons for the adjacency to remain in this state include the following: 251 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Incorrect NBMA configuration • Incorrect OSPF configuration • ACL filtering • Unidirectional connectivity Static Frame Relay mappings are susceptible to misconfigurations. When configuring OSPF over NBMA technologies, make sure that the DLCI or PVC mappings are correct. Use the appropriate Cisco IOS commands to verify bidirectional communication. For example, you could use the show frame-relay pvc command to perform such validations for Frame Relay networks as follows: R1#show frame-relay pvc 100 PVC Statistics for interface Serial0/0 (Frame Relay DCE) DLCI = 100, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial0/0 input pkts 22 output pkts 18 in bytes 2050 out bytes 1722 dropped pkts 0 in pkts dropped 0 out pkts dropped 0 out bytes dropped 0 in FECN pkts 0 in BECN pkts 0 out FECN pkts 0 out BECN pkts 0 in DE pkts 0 out DE pkts 0 out bcast pkts 1 out bcast bytes 34 5 minute input rate 0 bits/sec, 1 packets/sec 5 minute output rate 0 bits/sec, 1 packets/sec pvc create time 00:02:49, last time pvc status changed 00:01:59 NOTE: An alternative would be to use the debug serial interface command as follows: R1#debug serial interface Serial network interface debugging is on R1# *Mar 5 10:23:38.854: Serial0/0(in): StEnq, myseq 26 *Mar 5 10:23:38.854: Serial0/0(out): Status, myseq 27, *Mar 5 10:23:48.855: Serial0/0(in): StEnq, myseq 27 *Mar 5 10:23:48.855: Serial0/0(out): Status, myseq 28, *Mar 5 10:23:58.855: Serial0/0(in): StEnq, myseq 28 *Mar 5 10:23:58.855: Serial0/0(out): Status, myseq 29, *Mar 5 10:24:08.855: Serial0/0(in): StEnq, myseq 29 *Mar 5 10:24:08.855: Serial0/0(out): Status, myseq 30, *Mar 5 10:24:18.856: Serial0/0(in): StEnq, myseq 30 *Mar 5 10:24:18.856: Serial0/0(out): Status, myseq 31, yourseen 27, DCE up yourseen 28, DCE up yourseen 29, DCE up yourseen 30, DCE up yourseen 31, DCE up Incorrect OSPF configuration can also result in neighbors being stuck in the ATTEMPT state. This includes typographical errors, such as typing 172.16.1.1 instead of 172.16.11.1, for example. Check the configuration to ensure that neighbor statements are configured correctly using the correct IP addresses (remember to use the interface address, not the router ID). 252 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F As always, check for ACL filtering and ensure that OSPF packets are being allowed between the endpoints. Keeping in mind that OSPF uses Unicast packets and not Multicast on NBMA networks, check for ACL or other filtering configurations that may prevent host-to-host connectivity between the endpoints. Finally, if you suspect a unidirectional connectivity issue over the NBMA network, use the appropriate technology show and debug commands to identify and resolve this issue. This may include Layer 1 and Layer 2 troubleshooting. Following is the output of the show ip ospf neighbor command showing a neighbor stuck in the ATTEMPT/DROTHER state: R1#show ip ospf neighbor Neighbor ID N/A Pri 0 State Dead Time ATTEMPT/DROTHER - Address 172.16.0.2 Interface Serial0/0 The Neighbor is Stuck in the INIT State The INIT state is reached when an OSPF router receives a Hello packet from a neighbor but the local RID is not listed in the received Neighbor field. When OSPF routers are stuck in the INIT state, it typically indicates a unidirectional communication issue; that is, the neighbor is not receiving the Hellos sent by the local router and is therefore not including the RID of that originating or sending router in its Hello packet(s). Some common causes for this scenario are as follows: • ACL filtering on one side • Layer 1 and Layer 2 issues • NBMA misconfigurations It is possible that an ACL configured on either one of the devices could prevent the OSPF Hello packets from being received by the local router. The result of this would be that one router would reflect an empty Neighbor Table while the other router would be stuck in the INIT state. Consider the topology illustrated in Figure 6-2 below: R1 150.0.0.1/30 Serial0/0 150.0.0.2/30 Serial0/0 R3 Fig. 6-2. OSPF Neighbors Stuck in the INIT State Figure 6-2 illustrates a basic network comprised of two routers. Assuming that all configuration parameters are correct, the routers should be able to establish an adjacency. However, we will assume that the following ACL has been configured on R1 and applied inbound on Serial0/0: 253 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R1#show ip access-lists NETWORK-SECURITY Extended IP access list NETWORK-SECURITY 10 deny ip host 0.0.0.0 any log 20 deny ip 127.0.0.0 0.255.255.255 any log 30 deny ip 10.0.0.0 0.255.255.255 any log 40 deny ip 172.16.0.0 0.15.255.255 any log 50 deny ip 192.168.0.0 0.0.255.255 any log 60 deny ip any 224.0.0.0 15.255.255.255 log (16 matches) 70 permit ip any any Based on this configuration, R1 does not receive any Hello packets from R3 and will therefore show an empty Neighbor Table, as it is unaware of that router. However, because there is no ACL applied inbound on the Serial0/0 interface of R1, and OSPF is enabled on the 172.16.0.0/30 subnet, R3 will receive the Hello packets from R1. The problem arises due to the fact that because R1 never receives R3’s Hello packets, it will never include that router’s RID in the Hello packets it sends out. Given this, R1 becomes stuck in the INIT state. These two states can be validated using the show ip ospf neighbor command on both routers as follows: R1#show ip ospf neighbor R1# R1# The same command, if issued on R3, would display the following instead: R3#show ip ospf neighbor Neighbor ID 1.1.1.1 R3# Pri 0 State INIT/ - Dead Time 00:00:36 Address 150.0.0.1 Interface Serial0/0 This problem can be resolved by reconfiguring the ACL applied to R1’s Serial0/0 interface as follows: R1(config)#ip access-list extended NETWORK-SECURITY R1(config-ext-nacl)#no 60 R1(config-ext-nacl)#60 deny ip 224.0.0.0 15.255.255.255 any log R1(config-ext-nacl)# *Mar 5 11:19:57.629: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Serial0/0 from LOADING to FULL, Loading Done The same problem can also be caused by Layer 1 and Layer 2 issues, as well as basic NBMA misconfigurations, such as forgetting to append the broadcast keyword to the end of the frame-relay map statements. Follow the same basic process to troubleshoot the issue. 254 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F The Neighbor is Stuck in the 2WAY State The 2WAY state indicates bidirectional communication with the OSPF neighbor(s). This means that the local router has received a Hello packet with its own RID in the Neighbor field and Hello packet parameters are identical on the two routers. On multi-access networks, the DR and BDR routers are elected during this phase. It should be noted that this state is perfectly acceptable for non-DR/BDR routers, as they will never complete database exchange between themselves and thus will never reach the FULL state. Consider the network illustrated in Figure 6-3 below: Lo0: 1.1.1.1/32 Lo0: 2.2.2.2/32 R1 192.168.1.1/24 R2 192.168.1.2/24 192.168.1.3/24 192.168.1.4/24 R3 R4 Lo0: 3.3.3.3/32 Lo0: 4.4.4.4/32 Fig. 6-3. OSPF DR and BDR Fundamentals Referencing Figure 6-3, each router on the segment establishes an adjacency with the DR and the BDR, but not with each other. In other words, non-DR/BDR routers do not establish an adjacency with each other. This prevents the routers on the segment from forming N(N-1) adjacencies with each other, which reduces excessive OSPF packet flooding on the segment. Assuming that all routers on the segment have the default priority, R4 is elected Designated Router for the segment because it has the highest router ID. R3 is elected Backup Designated Router for the segment because it has the second highest router ID. Because R2 and R1 are neither the DR nor the BDR, they are referred to as DROther routers. This can be validated using the show ip ospf neighbor command on all routers as follows: R1#show ip ospf neighbor Neighbor ID Pri State 2.2.2.2 1 2WAY/DROTHER 3.3.3.3 1 FULL/BDR 4.4.4.4 1 FULL/DR R2#show ip ospf neighbor Dead Time 00:00:38 00:00:39 00:00:38 255 Address 192.168.1.2 192.168.1.3 192.168.1.4 Interface Ethernet0/0 Ethernet0/0 Ethernet0/0 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Neighbor ID Pri State 1.1.1.1 1 2WAY/DROTHER 3.3.3.3 1 FULL/BDR 4.4.4.4 1 FULL/DR R3#show ip ospf neighbor Dead Time 00:00:32 00:00:33 00:00:32 Address 192.168.1.1 192.168.1.3 192.168.1.4 Interface FastEthernet0/0 FastEthernet0/0 FastEthernet0/0 Neighbor ID Pri State 1.1.1.1 1 FULL/DROTHER 2.2.2.2 1 FULL/DROTHER 4.4.4.4 1 FULL/DR R4#show ip ospf neighbor Dead Time 00:00:36 00:00:36 00:00:35 Address 192.168.1.1 192.168.1.2 192.168.1.4 Interface FastEthernet0/0 FastEthernet0/0 FastEthernet0/0 Neighbor ID 1.1.1.1 2.2.2.2 3.3.3.3 Dead Time 00:00:39 00:00:39 00:00:30 Address 192.168.1.1 192.168.1.2 192.168.1.3 Interface FastEthernet0/0 FastEthernet0/0 FastEthernet0/0 Pri 1 1 1 State FULL/DROTHER FULL/DROTHER FULL/BDR Notice that the DROther routers remain in the 2WAY/DROTHER state. This is normal because they exchange their databases only with the DR and BDR routers. Therefore, because there is no full database exchange between the DROther routers, they will never reach the FULL state. While the 2WAY state is acceptable in scenarios such as the above, it is not fine when only two routers reside on a multi-access segment. In such cases, the cause of this problem is due to both routers being configured with a priority of 0 via the ip ospf priority command. You can troubleshoot this issue using the show ip ospf neighbor and show ip ospf interface commands. Following is a sample output of the show ip ospf neighbor command when a priority of 0 has been configured for the neighbor: R1#show ip ospf neighbor Neighbor ID 3.3.3.3 Pri 0 State 2WAY/DROTHER Dead Time 00:00:35 Address 150.0.0.2 Interface FastEthernet0/0 Following is the output of the show ip ospf interface command on the same router: R1#show ip ospf interface FastEthernet0/0 FastEthernet0/0 is up, line protocol is up Internet Address 150.0.0.1/24, Area 0 Process ID 1, Router ID 1.1.1.1, Network Type BROADCAST, Cost: 1 Transmit Delay is 1 sec, State DROTHER, Priority 0 No designated router on this network No backup designated router on this network Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5 oob-resync timeout 40 Hello due in 00:00:08 Supports Link-Local Signaling (LLS) 256 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F Index 1/1, flood queue length 0 Next 0x0(0)/0x0(0) Last flood scan length is 1, maximum is 1 Last flood scan time is 0 msec, maximum is 0 msec Neighbor Count is 1, Adjacent neighbor count is 0 Suppress hello for 0 neighbor(s) The Neighbor is Stuck in the EXSTART/EXCHANGE State The Exstart state is used for the initialization of the database synchronization process. It is at this stage that the local router and its neighbor establish which router is in charge of the database synchronization process. The Master and Slave are elected in this state and the first sequence number for DBD exchange is decided by the Master in this stage. The Exchange state is where routers describe the contents of their databases using DBD packets. Each DBD sequence is explicitly acknowledged, and only one outstanding DBD packet is allowed at a time. During this phase, LSR packets are also sent to request a new instance of the LSA. The M (More) bit is used to request missing information during this stage. When both routers have exchanged their complete databases, they will both set the M bit to 0. Several possible reasons for the neighbor to be stuck in the EXSTART/EXCHANGE state include the following: • Mismatched MTU values • Duplicate RIDs • Broken Unicast connectivity While EIGRP ignores interface MTU values and allows routers with different interface MTUs to establish a neighbor relationship, the same is not applicable with OSPF. Referencing the topology illustrated in Figure 6-2, which depicts routers R1 and R3 connected via a back-to-back Serial connection, the following shows the neighbor state on R1 assuming an MTU mismatch: R1#show ip ospf neighbor Neighbor ID 3.3.3.3 Pri 0 State EXCHANGE/ - Dead Time 00:00:38 Address 150.0.0.2 Interface Serial0/0 Address 150.0.0.1 Interface Serial0/0 The same command would show the following output on R3: R3#show ip ospf neighbor Neighbor ID 1.1.1.1 Pri 0 State EXSTART/ - Dead Time 00:00:39 257 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The simplest way to troubleshoot this issue is to validate interface MTU configurations by using the show running-config interface [name] privileged EXEC command. Alternatively, interface MTUs are also printed in the output of the show interfaces and show ip interfaces commands. Following is a sample output of the information printed by the latter: R1#show ip interface Serial0/0 Serial0/0 is up, line protocol is up Internet address is 150.0.0.1/30 Broadcast address is 255.255.255.255 Address determined by setup command MTU is 1504 bytes Helper address is not set Directed broadcast forwarding is disabled Multicast reserved groups joined: 224.0.0.5 Outgoing access list is not set Inbound access list is NETWORK-SECURITY Proxy ARP is enabled The EXSTART/EXCHANGE state can also be caused by duplicate RIDs or broken Unicast connectivity between the routers. As is the case with mismatched MTU values, the troubleshooting process for duplicate RIDs should begin by simply looking at device configurations to ensure that the same RID was not entered on two different devices. The Unidirectional issues may be caused by Layer 1 and Layer 2 issues, as well as by Layer 3 functions such as Access Control Lists. Eliminate each layer as you troubleshoot to isolate the cause. The Neighbor is Stuck in the LOADING State In the Loading state OSPF routers build an LSR packet and Link State Retransmission list. LSR packets are sent to request the more recent instance of an LSA that has not been received during the Exchange process. Update packets that are sent during this phase are placed on the Link State Retransmission list until the local router receives an acknowledgement. If the local router also receives an LSR packet during this phase, it will respond with a Link State Update packet that contains the requested information. It is rare to see a neighbor stuck in the LOADING state. However, a corrupted Link State Request packet may cause this issue. If such an event occurs, troubleshoot the issue by looking at error counters on the applicable interfaces and use a component-swapping approach to attempt to isolate the problem. Typically, such issues are caused by faulty hardware, which may then need to be replaced. 258 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F TROUBLESHOOTING ROUTE ADVERTISEMENT As is the case with EIGRP, there may be times when you notice that OSPF is not advertising certain routes. For the most part, this is typically due to some misconfigurations versus a protocol failure. Some common reasons for this include the following: • OSPF is not enabled on the interface(s) • The interface(s) is/are down • Interface addresses in a different area • OSPF misconfigurations A common reason why OSPF does not advertise routes is that the network is not advertised via OSPF. In current Cisco IOS versions, networks can be advertised using the network router configuration command or the ip ospf interface configuration command. Regardless of the method used, the show ip protocols command can be used to view which networks OSPF is configured to advertise as can be seen in the following output: R2#show ip protocols Routing Protocol is “ospf 1” Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is not set Router ID 2.2.2.2 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Maximum path: 4 Routing for Networks: 10.2.2.0 0.0.0.128 Area 1 20.2.2.0 0.0.0.255 Area 1 Routing on Interfaces Configured Explicitly (Area 1): Loopback0 Reference bandwidth unit is 100 mbps Routing Information Sources: Gateway Distance Last Update 1.1.1.1 110 00:00:17 Distance: (default is 110) Additionally, keep in mind that you can also use the show ip ospf interfaces command to find out for which interfaces OSPF has been enabled, among other things. In addition to network configuration, if the interface is down, OSPF will not advertise the route. You can use the show ip ospf interface command to determine the interface state as follows: R1#show ip ospf interface brief Interface PID Area Lo100 1 0 Fa0/0 1 0 IP Address/Mask 100.1.1.1/24 10.0.0.1/24 259 Cost 1 1 State Nbrs F/C DOWN 0/0 BDR 1/1 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Referencing the output above, we can determine that Loopback100 is in a DOWN state. Taking a closer look, we can see that the issue is because the interface has been administratively shut as illustrated in the following output: R1#show ip ospf interface Loopback100 Loopback100 is administratively down, line protocol is down Internet Address 100.1.1.1/24, Area 0 Process ID 1, Router ID 1.1.1.1, Network Type LOOPBACK, Cost: 1 Enabled by interface config, including secondary ip addresses Loopback interface is treated as a stub Host If we debugged IP routing events using the debug ip routing command and then issued the no shutdown command under the Loopback100 interface, then we would see the following: R1#debug ip routing IP routing debugging is on R1#conf t Enter configuration commands, one per line. End with CNTL/Z. R1(config)#interface Loopback100 R1(config-if)#no shutdown R1(config-if)#end R1# *Mar 18 20:03:34.687: RT: is_up: Loopback100 1 state: 4 sub state: 1 line: 0 has_route: False *Mar 18 20:03:34.687: RT: SET_LAST_RDB for 100.1.1.0/24 NEW rdb: is directly connected *Mar 18 20:03:34.687: RT: add 100.1.1.0/24 via 0.0.0.0, connected metric [0/0] *Mar 18 20:03:34.687: RT: NET-RED 100.1.1.0/24 *Mar 18 20:03:34.687: RT: interface Loopback100 added to routing table ... [Truncated Output] When multiple addresses are configured under an interface, all secondary addresses must be in the same area as the primary address; otherwise, OSPF will not advertise these networks. As an example, consider the network topology illustrated in Figure 6-4 below: FastEthernet0/0 R1 FastEthernet0/0 10.0.0.2/24 10.0.0.1/24 10.0.1.1/24 10.0.2.1/24 Fig. 6-4. OSPF Secondary Subnet Advertisement 260 R2 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F Referencing Figure 6-4, routers R1 and R2 are connected via a back-to-back connection. These two routers share the 10.0.0.0/24 subnet. However, in addition, R1 has been configured with some additional (secondary) subnets under its FastEthernet0/0 interface so that the interface configuration on R1 is printed as follows: R1#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 183 bytes ! interface FastEthernet0/0 ip address 10.0.1.1 255.255.255.0 secondary ip address 10.0.2.1 255.255.255.0 secondary ip address 10.0.0.1 255.255.255.0 duplex auto speed auto end OSPF is enabled on both R1 and R2. The configuration implemented on R1 is as follows: R1#show running-config | section ospf router ospf 1 router-id 1.1.1.1 log-adjacency-changes network 10.0.0.1 0.0.0.0 Area 0 network 10.0.1.1 0.0.0.0 Area 1 network 10.0.2.1 0.0.0.0 Area 1 The configuration implemented on R2 is as follows: R2#show running-config | section ospf router ospf 2 router-id 2.2.2.2 log-adjacency-changes network 10.0.0.2 0.0.0.0 Area 0 By default, because the secondary subnets have been placed into a different OSPF area on R1, they will not be advertised by the router. This can be seen on R2, which displays the following when the show ip route command is issued: R2#show ip route Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route 261 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Gateway of last resort is not set C 10.0.0.0/24 is subnetted, 1 subnets 10.0.0.0 is directly connected, FastEthernet0/0 To resolve this issue, the secondary subnets must also be assigned to Area 0 as follows: R1(config)#router ospf 1 R1(config-router)#network 10.0.1.1 0.0.0.0 Area 0 *Mar 18 20:20:37.491: %OSPF-6-AREACHG: 10.0.1.1/32 changed from Area 1 to Area 0 R1(config-router)#network 10.0.2.1 0.0.0.0 Area 0 *Mar 18 20:20:42.211: %OSPF-6-AREACHG: 10.0.2.1/32 changed from Area 1 to Area 0 R1(config-router)#end After this configuration change, the networks are now advertised to router R2 as follows: R2#show ip route Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is not set O C O 10.0.0.0/24 10.0.2.0 10.0.0.0 10.0.1.0 is subnetted, 3 subnets [110/2] via 10.0.0.1, 00:01:08, FastEthernet0/0 is directly connected, FastEthernet0/0 [110/2] via 10.0.0.1, 00:01:08, FastEthernet0/0 In addition to the three other common causes described above, poor design, implementation, and misconfigurations are another reason OSPF may not advertise networks as expected. Common design issues that cause such issues include a discontiguous or partitioned backbone, and area type misconfigurations, such as configuring areas as totally stubby, for example. For this reason, it is important to have a solid understanding of how the protocol works and how it has been implemented in your environment. This understanding will greatly simplify the troubleshooting process, as half the battle is already won before you even start troubleshooting the problem or issue. 262 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F TROUBLESHOOTING ROUTE REDISTRIBUTION ISSUES Route redistribution configuration can often be very complex, especially when redistributing at multiple points in the network. While there are a plethora of things that can go wrong if redistribution is not implemented correctly (e.g., routing loops), this section will only delve into reasons why OSPF may not advertise external or redistributed routes. These reasons include the following: • Omitting the subnets keyword during redistribution • Incorrectly filtering outbound advertisements When redistributing routes into OSPF, by default only Classful subnets are redistributed, unless the subnets keyword is included in the configuration. When the subnets keyword is omitted, Cisco IOS software prints the following configuration warning message on the console: R1(config)#router ospf 1 R1(config-router)#redistribute eigrp 1 % Only classful networks will be redistributed R1(config-router)# In addition to viewing the router configuration, you can also use the show ip protocols command to determine whether subnets will be included in the redistribution. If the subnets keyword has been omitted, then the show ip protocols command displays the following: R1#show ip protocols Routing Protocol is “ospf 1” Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is not set Router ID 1.1.1.1 It is an autonomous system boundary router Redistributing External Routes from, eigrp 1 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Maximum path: 4 Routing for Networks: 10.0.0.1 0.0.0.0 Area 0 10.0.1.1 0.0.0.0 Area 0 10.0.2.1 0.0.0.0 Area 0 Routing on Interfaces Configured Explicitly (Area 0): Loopback100 Reference bandwidth unit is 100 mbps Routing Information Sources: Gateway Distance Last Update 2.2.2.2 110 00:00:03 Distance: (default is 110) However, if the subnets keyword has been included in the redistribution configuration (e.g., the statement redistribute eigrp 1 subnets was added to R1), then the show ip protocols command would instead display the following: 263 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R1#show ip protocols Routing Protocol is “ospf 1” Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is not set Router ID 1.1.1.1 It is an autonomous system boundary router Redistributing External Routes from, eigrp 1, includes subnets in redistribution Number of areas in this router is 1. 1 normal 0 stub 0 nssa Maximum path: 4 Routing for Networks: 10.0.0.1 0.0.0.0 Area 0 10.0.1.1 0.0.0.0 Area 0 10.0.2.1 0.0.0.0 Area 0 Routing on Interfaces Configured Explicitly (Area 0): Loopback100 Reference bandwidth unit is 100 mbps Routing Information Sources: Gateway Distance Last Update 2.2.2.2 110 00:00:26 Distance: (default is 110) When using distribute lists to filter OSPF routes, if an outbound distribute list is applied to an ASBR, the ASBR will generate Type 5 LSAs only for prefixes that are explicitly permitted by the distribute list. Only these prefixes will be advertised. To clarify this point further, a router running OSPF has been configured with the following static routes: R1#show ip route static S 192.168.4.0/24 is directly S 192.168.5.0/24 is directly S 192.168.1.0/24 is directly S 192.168.2.0/24 is directly S 192.168.3.0/24 is directly connected, connected, connected, connected, connected, Null0 Null0 Null0 Null0 Null0 The same router currently has the following distribute list configured: R1#show running-config | section ospf|access-list router ospf 1 router-id 1.1.1.1 log-adjacency-changes network 10.0.0.1 0.0.0.0 Area 0 network 10.0.1.1 0.0.0.0 Area 0 network 10.0.2.1 0.0.0.0 Area 0 distribute-list 1 out access-list 1 permit 10.0.0.0 0.255.255.255 Next, the static routes are redistributed into OSPF as follows: R1(config)#router ospf 1 R1(config-router)#redistribute static subnets 264 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F By default, because the distribute list references ACL 1, which permits all routes in the 10.0.0.0/8 major network, the ASBR will not generate a Type 5 LSA for 192.168.x.0/24 static routes as is evident and seen in the following output of the show ip ospf database command on the local router: R1#show ip ospf database OSPF Router with ID (1.1.1.1) (Process ID 1) Router Link States (Area 0) Link ID 1.1.1.1 2.2.2.2 ADV Router 1.1.1.1 2.2.2.2 Age 1140 11 Seq# Checksum Link Count 0x8000000E 0x00FAD0 3 0x80000006 0x002ED7 1 Net Link States (Area 0) Link ID 10.0.0.2 ADV Router 2.2.2.2 Age 938 Seq# Checksum 0x80000003 0x003FD8 In order to allow the ASBR to generate and advertise Type 5 LSAs for the static subnets, the ACL configuration must be modified in a manner similar to the following: R1#conf t Enter configuration commands, one per line. End with CNTL/Z. R1(config)#access-list 1 permit 192.168.0.0 0.0.255.255 R1(config)#exit R1#clear ip ospf redistribution Following this configuration change, the Link State Database on router R1 displays the following: R1#show ip ospf database OSPF Router with ID (1.1.1.1) (Process ID 1) Router Link States (Area 0) Link ID 1.1.1.1 2.2.2.2 ADV Router 1.1.1.1 2.2.2.2 Age 1589 460 Seq# Checksum Link Count 0x8000000E 0x00FAD0 3 0x80000006 0x002ED7 1 Net Link States (Area 0) Link ID 10.0.0.2 ADV Router 2.2.2.2 Age 1387 Seq# Checksum 0x80000003 0x003FD8 Type-5 AS External Link States 265 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Link ID 192.168.1.0 192.168.2.0 192.168.3.0 192.168.4.0 192.168.5.0 ADV Router 1.1.1.1 1.1.1.1 1.1.1.1 1.1.1.1 1.1.1.1 Age 18 18 18 18 17 Seq# 0x80000002 0x80000002 0x80000002 0x80000002 0x80000002 Checksum 0x000B26 0x00FF30 0x00F43A 0x00E944 0x00DE4E Tag 0 0 0 0 0 TROUBLESHOOTING ROUTE SUMMARIZATION Unlike EIGRP, OSPF does not automatically summarize at Classful boundaries. In addition, OSPF summarization is different for internal and external routes, which often results in confusions and misconfigurations. In Cisco IOS software, internal route summarization is configured by using the area [area ID] range [<address> <mask> [advertise | not-advertise]] [cost <cost>] router configuration command on the Area Border Router (ABR). Following the configu- ration of the summary on the ABR, the following occurs: • An intra-area route for the summary pointing to Null0 is installed into the routing table • The specific Type 3 entries in the LSDB are replaced by the single Type 3 LSA A common source of confusion with OSPF summarization is that the area…range command is used on a non-ABR router instead of the ABR. When this happens, the command will be accepted; however, the configured range will be flagged as passive and will not be advertised by the router. This can be validated using the show ip ospf command as illustrated below: R2#show ip ospf | begin Area Area BACKBONE(0) Number of interfaces in this area is 1 Area has no authentication SPF algorithm last executed 00:00:59.544 ago SPF algorithm executed 17 times Area ranges are Number of LSA 3. Checksum Sum 0x016581 Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 Area 2 Number of interfaces in this area is 0 Area has no authentication SPF algorithm last executed 00:00:59.548 ago SPF algorithm executed 5 times Area ranges are 11.0.0.0/16 Passive Advertise Number of LSA 1. Checksum Sum 0x00032F 266 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F Number of opaque link LSA 0. Checksum Sum 0x000000 Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 External route summarization is configured on the OSPF ASBR using the summary-address [<address> <mask> | prefix] [not-advertise] [tag <tag>] [nssa-only] router con- figuration command. The [<address> <mask> keywords are used to specify the summary network address and its subnet mask. After this configuration has been implemented, the following is performed on the ASBR: • An intra-area route for the summary pointing to Null0 is installed into the routing table • The specific Type 5 entries in the LSDB are replaced by the single Type 5 LSA A common misconfiguration is using the summary-address command on an ABR or other nonASBR router. Another common mistake is attempting to use the area…range command when summarizing external routes. Neither of these solutions will work, as the summary-address command should be implemented only on an ASBR. If you issue this command on an ASBR and no valid addresses belong to the range specified, the summary will be assigned a metric of infinity (16777215) by the router. For example, assume the following configuration was implemented on an ASBR: R1(config)#router ospf 1 R1(config-router)#summary-address 150.0.0.0 255.255.0.0 R1(config-router)#end Assuming that there are no known networks in the 150.0.0.0/16 range, then the summary route is assigned the infinity metric, which can be viewed using the show ip ospf summary-address command. Following is the output of this command on the same router: R1#show ip ospf summary-address OSPF Process 1, Summary-address 150.0.0.0/255.255.0.0 Metric 16777215, Type 0, Tag 0 267 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I DEBUGGING OSPF ROUTING ISSUES In the final section of this chapter, we will look at some of the more commonly used OSPF debugging commands. OSPF debugging is enabled using the debug ip ospf command. This command can be used in conjunction with the following additional keywords: R1#debug ip ospf ? adj OSPF database-timer OSPF events OSPF flood OSPF hello OSPF lsa-generation OSPF mpls OSPF nsf OSPF packet OSPF retransmission OSPF spf OSPF tree OSPF adjacency events database timer events flooding hello events lsa generation MPLS non-stop forwarding events packets retransmission events spf database tree The debug ip ospf adj command prints real-time information on adjacency events. This is a useful troubleshooting tool when troubleshooting OSPF neighbor adjacency problems. Following is a sample of the information that is printed by this command. The example below illustrates how this command can be used to determine that an MTU mismatch is preventing the neighbor adjacency from reaching the FULL state: R1#debug ip ospf adj OSPF adjacency events R1# *Mar 18 23:13:21.279: *Mar 18 23:13:21.279: *Mar 18 23:13:21.279: *Mar 18 23:13:21.279: *Mar 18 23:13:21.283: *Mar 18 23:13:21.283: *Mar 18 23:13:21.283: *Mar 18 23:13:21.283: *Mar 18 23:13:21.283: *Mar 18 23:13:21.283: opt 0x52 flag 0x7 len *Mar 18 23:13:21.283: *Mar 18 23:13:21.283: *Mar 18 23:13:21.287: 0x52 flag 0x2 len 192 *Mar 18 23:13:26.275: opt 0x52 flag 0x7 len *Mar 18 23:13:26.279: *Mar 18 23:13:26.279: 0x52 flag 0x2 len 192 debugging is on OSPF: DR/BDR election on FastEthernet0/0 OSPF: Elect BDR 2.2.2.2 OSPF: Elect DR 1.1.1.1 DR: 1.1.1.1 (Id) BDR: 2.2.2.2 (Id) OSPF: Neighbor change Event on interface FastEthernet0/0 OSPF: DR/BDR election on FastEthernet0/0 OSPF: Elect BDR 2.2.2.2 OSPF: Elect DR 1.1.1.1 DR: 1.1.1.1 (Id) BDR: 2.2.2.2 (Id) OSPF: Rcv DBD from 2.2.2.2 on FastEthernet0/0 seq 0xA65 32 mtu 1480 state EXSTART OSPF: Nbr 2.2.2.2 has smaller interface MTU OSPF: NBR Negotiation Done. We are the SLAVE OSPF: Send DBD to 2.2.2.2 on FastEthernet0/0 seq 0xA65 opt OSPF: Rcv DBD from 2.2.2.2 on FastEthernet0/0 seq 0xA65 32 mtu 1480 state EXCHANGE OSPF: Nbr 2.2.2.2 has smaller interface MTU OSPF: Send DBD to 2.2.2.2 on FastEthernet0/0 seq 0xA65 opt ... [Truncated Output] 268 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F From the output above, we can conclude that the MTU on the local router is larger than 1480 bytes because the debug output shows that the neighbor has the smaller MTU value. The recommended solution would be to adjust the smaller MTU value so that both neighbors have the same interface MTU values. This will allow the adjacency to reach the FULL state. The debug ip ospf lsa-generation command prints information on OSPF LSAs. This command can be used to troubleshoot route advertisement when using OSPF. Following is a sample output of the information that is printed by this command: R1#debug ip ospf lsa-generation OSPF summary lsa generation debugging is on R1# R1# *Mar 18 23:25:59.447: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Interface down or detached *Mar 18 23:25:59.511: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0 from LOADING to FULL, Loading Done *Mar 18 23:26:00.491: OSPF: Start redist-scanning *Mar 18 23:26:00.491: OSPF: Scan the RIB for both redistribution and translation *Mar 18 23:26:00.499: OSPF: max-aged external LSA for summary 150.0.0.0 255.255.0.0, scope: Translation *Mar 18 23:26:00.499: OSPF: End scanning, Elapsed time 8ms *Mar 18 23:26:00.499: OSPF: Generate external LSA 192.168.4.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001 *Mar 18 23:26:00.503: OSPF: Generate external LSA 192.168.5.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001 *Mar 18 23:26:00.503: OSPF: Generate external LSA 192.168.1.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001 *Mar 18 23:26:00.503: OSPF: Generate external LSA 192.168.2.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001 *Mar 18 23:26:00.507: OSPF: Generate external LSA 192.168.3.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001 *Mar 18 23:26:05.507: OSPF: Generate external LSA 192.168.4.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000006 *Mar 18 23:26:05.535: OSPF: Generate external LSA 192.168.5.0, mask 255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000006 The debug ip ospf spf command provides real-time information about Shortest Path First algorithm events. This command can be used in conjunction with the following keywords: R1#debug ip ospf spf ? external OSPF spf external-route inter OSPF spf inter-route intra OSPF spf intra-route statistic OSPF spf statistics <cr> As is the case with all debug commands, consideration should be given to factors such as the size of the network and the resource utilization on the router before debugging SPF events. Following is a sample of the output from the debug ip ospf spf statistic command: 269 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R1#debug ip ospf spf statistic OSPF spf statistic debugging is on R1# R1#clear ip ospf process Reset ALL OSPF processes? [no]: y R1# R1# *Mar 18 23:37:27.795: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Interface down or detached *Mar 18 23:37:27.859: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0 from LOADING to FULL, Loading Done *Mar 18 *Mar 18 *Mar 18 *Mar 18 10000ms *Mar 18 *Mar 18 *Mar 18 *Mar 18 23:37:32.859: OSPF: Begin SPF at 28081.328ms, process time 608ms 23:37:32.859: spf_time 07:47:56.328, wait_interval 5000ms 23:37:32.859: OSPF: End SPF at 28081.328ms, Total elapsed time 0ms 23:37:32.859: Schedule time 07:48:01.328, Next wait_interval 23:37:32.859: 23:37:32.859: 23:37:32.859: 23:37:32.863: Intra: 0ms, Inter: 0ms, External: 0ms R: 2, N: 1, Stubs: 2 SN: 0, SA: 0, X5: 0, X7: 0 SPF suspends: 0 intra, 0 total NOTE: Prior to enabling SPF debugs, consider using show commands, such as the show ip ospf statistics and show ip ospf commands, to troubleshoot first. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Open Short Path First Protocol Overview • OSPF data structures store operational data, configured parameters and statistics • There are four OSPF data structures as follows: 1. The Interface Table 2. The Neighbor Table 3. The Link State Database 4. The Routing Information Base • The Interface Table provides a list of all interfaces that have been enabled for OSPF • The Neighbor Table tracks all active OSPF neighbors • The Link State Database (LSDB) contains information about the network topology • The Routing Information Base (RIB) contains the results derived from the SPF calculation • OSPF is a hierarchical routing protocol that logically divides the network into sub-domains • This logical segmentation is used to limit the scope of LSA flooding • In a multi-area OSPF network, one area must be designated as the backbone area • The backbone is the logical center of the OSPF network 270 C H A P T E R 6: T RO U B L ES H O OT I N G O S P F • • All other non-backbone areas must be physically or logically connected to the backbone OSPF uses the following default network types for different media: 1. Non-Broadcast 2. Point-to-Point 3. Broadcast 4. Point-to Multipoint • • • OSPF neighbors go through several states before the adjacency becomes FULL Valid OSPF states are Down, Attempt, Init, 2-Way, Exstart, Exchange, Loading and Full The following parameters must match before an adjacency will go into the Full state: 1. The interface MTU values 2. The Hello and Dead Timers 3. The Area ID 4. The Authentication Type and Password 5. The Stub Area flag 6. IP Subnet and Subnet Mask • Each LSA begins with a standard 20-byte LSA header that contains the following: 1. Link State Age 2. Options 3. Link State Type 4. Link State ID 5. Advertising Router 6. Link State Sequence Number 7. Link State Checksum 8. Length • • • • • • • • Type 1 LSAs are generated by each router for each area it belongs to Type 2 LSAs advertise the routers on the multi-access segment Type 3 LSAs are a summary of destinations outside of the local area Type 4 LSAs describe information about an Autonomous System Boundary Router Type 5 LSAs provide the network information necessary to reach the external networks Type 7 LSAs are used for external routing information from the ASBR in an NSSA OSPF supports addition special types of areas in addition to area 0 and normal areas These special types of areas are as follows: 1. Not-so-stubby Areas 2. Totally Not-so-stubby Areas 3. Stub Areas 4. Totally Stubby Areas 271 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • A virtual link is a logical extension of the OSPF backbone Troubleshoo ng Neighbor Rela onships • OSPF neighbor relationship issues fall into one of the following categories: 1. The Neighbor Table is Empty 2. The Neighbor is stuck in the ATTEMPT state 3. The Neighbor is stuck in the INIT state 4. The Neighbor is stuck in the 2WAY state 5. The Neighbor is stuck in the EXSTART/EXCHANGE state 6. The Neighbor is stuck in the LOADING state • The OSPF Neighbor Table may be empty due to one of the following: 1. Basic OSPF Misconfigurations 2. Layer 1 and Layer 2 Issues 3. Access List Filtering 4. Interface Misconfigurations • The OSPF adjacency may be stuck in the ATTEMPT state due to the following: 1. Incorrect NBMA Configuration 2. Incorrect OSPF Configuration 3. ACL Filtering 4. Unidirectional Connectivity • The OSPF adjacency may be stuck in the INIT state due to the following: 1. ACL Filtering on One Side 2. Layer 1 and Layer 2 Issues 3. NBMA Misconfigurations • The OSPF adjacency may be stuck in the 2WAY state due to the following: 1. Neighbors all have their priority values set to zero (0) • The OSPF adjacency may be stuck in the EXSTART/EXCHANGE state due to the following: 1. Mismatched MTU Values 2. Duplicate RIDs 3. Broken Unicast Connectivity • The OSPF adjacency may be stuck in the LOADING state due to the following: 1. Corrupted LSR packets—which could be due to hardware or software issues 272 CHAPTER 7 Troubleshoo ng BGP C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I B order Gateway Protocol is first and foremost a policy control tool. Unlike traditional routing protocols that are used to exchange routing information within an autonomous system, BGP is traditionally used to exchange routing information between routing domains or autonomous systems. However, BGP can also be used to exchange routing information within a single routing domain. The TSHOOT certification exam objective that is covered in this chapter is as follows: • Troubleshoot eBGP While it is not possible to delve into all potential BGP problem scenarios, this chapter discusses some of the most common problem scenarios when using BGP. This chapter begins with an overview of BGP routing and then concludes with some common problem scenarios that pertain to BGP. This chapter is divided into the following sections: • Border Gateway Protocol Overview • Troubleshooting Neighbor Relationships • Troubleshooting Route Advertisement • Troubleshooting Route Redistribution Issues • Debugging BGP Routing Issues BORDER GATEWAY PROTOCOL OVERVIEW Border Gateway Protocol is a Path Vector protocol that is used primarily to exchange Network Layer Reachability Information (NLRI) between routing domains or autonomous systems. In other words, BGP is used as an inter-domain or inter-autonomous system protocol. NLRI is exchanged between BGP routers, referred to as BGP speakers, using UPDATE messages. The NLRI is composed of a prefix and a length. The prefix refers to the network address for that subnet, and the length specifies the number of network bits and is simply a network mask in CIDR notation. Some NLRI examples include 10.0.0.0/8 and 150.1.1.0/24. The sections that follow describe core BGP characteristics with which you should be intimately familiar. It is important to remember, however, that the primary emphasis of this guide is troubleshooting. Therefore, core BGP principles will be described only briefly in this chapter. Additional information on BGP can be found in the current ROUTE study guide, which is available online. Cisco IOS So ware Border Gateway Protocol Processes The following four BGP processes run when BGP is enabled in Cisco IOS-based devices: 1. The BGP Open process 2. The BGP I/O process 274 C H A P T E R 7: T RO U B L ES H O OT I N G B G P 3. The BGP Scanner process 4. The BGP Router process The BGP Open process is used for peer establishment. This process runs at initialization, when establishing a Transmission Control Protocol (TCP) connection with a BGP peer. The BGP I/O process handles the reading, writing, and execution of BGP messages, such as the UPDATE and KEEPALIVE messages. This process provides the interface between TCP and BGP, reading messages from the TCP socket, placing them into the BGP input queue so that they can be processed by the BGP Router process, and moving messages. The I/O process also moves messages in the output queue (OutQ) to the TCP socket. The BGP Scanner process periodically scans the BGP Routing Information Base (RIB) in order to determine whether prefixes and attributes should be deleted and whether route maps or filter caches should be flushed. Additionally, the BGP Scanner walks the BGP table and confirms reachability of the next-hops (i.e., it validates that next-hops are still valid). If the next-hop for a prefix is not reachable, all BGP entries that use that next-hop are removed from the BGP RIB. By default, the BGP Scanner runs every 60 seconds; however, this interval can be changed using CLI commands. Finally, the BGP Router process sends and receives routes, establishes peers, and interacts with the RIB. This process is also used to calculate the BGP best path and receives commands entered via the CLI. The BGP Router process is the main process responsible for initiating the other BGP processes. The three major components of the BGP Router process are as follows: 1. The BGP Routing Information Base (RIB) 2. The IP RIB for BGP-learned prefixes 3. The IP switching component for BGP-learned prefixes The BGP RIB contains network entries, path entries, path attributes, and additional information, such as route map and BGP filter list cache entries. The BGP-learned prefixes are stored in the IP RIB in two types of structures, which are Network Description Blocks (NDBs) and Routing Descriptor Blocks (RDBs). An NDB is a single entry in the routing table that represents a network prefix and contains information such as the network address, mask, and administrative distance. The NDB is stored in the routing table with an RDB, which is used to store the actual next-hop information. Finally, the IP switching component refers to structures such as the FIB, which is applicable when Cisco Express Forwarding (CEF) is enabled. 275 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I BGP Data Structures In Cisco IOS software, BGP information is stored in one of two data structures or tables, which are the Neighbor Table and the BGP Table. Both are described in the following section. The BGP Neighbor Table contains a list of all the configured neighbors of the local BGP speaker. This includes the IP address of the neighbor, the neighbor’s autonomous system, the adjacency state (e.g., established, active, etc.), and other pertinent information, such as the number of prefixes received from the neighbor. The Neighbor Table can be viewed using the show ip bgp neighbors command. However, a summary of all neighbors is also included in the output of the show ip bgp summary command. Following is a sample output of the show ip bgp neighbors command for a specific neighbor: R4#show ip bgp neighbors 3.3.3.3 BGP neighbor is 3.3.3.3, remote AS 3, external link BGP version 4, remote router ID 3.3.3.3 BGP state = Established, up for 17:21:33 Last read 00:00:33, last write 00:00:33, hold time is 180, keepalive interval is 60 seconds Neighbor capabilities: Route refresh: advertised and received(old & new) Address family IPv4 Unicast: advertised and received Message statistics: InQ depth is 0 OutQ depth is 0 Sent Rcvd Opens: 1 1 Notifications: 0 0 Updates: 43 43 Keepalives: 1044 1044 Route Refresh: 1 0 Total: 1089 1088 Default minimum time between advertisement runs is 30 seconds For address family: IPv4 Unicast BGP table version 125, neighbor version 125/0 Output queue size : 0 Index 1, Offset 0, Mask 0x2 1 update-group member Sent Rcvd Prefix activity: ------Prefixes Current: 3 0 Prefixes Total: 64 0 Implicit Withdraw: 7 0 Explicit Withdraw: 54 0 Used as bestpath: n/a 0 Used as multipath: n/a 0 Outbound 276 Inbound C H A P T E R 7: T RO U B L ES H O OT I N G B G P Local Policy Denied Prefixes: -------------AS_PATH loop: n/a 64 Suppressed due to dampening: 6 n/a Total: 6 64 Number of NLRIs in the update sent: max 18, min 0 Connections established 1; dropped 0 Last reset never External BGP neighbor may be up to 2 hops away. Connection state is ESTAB, I/O status: 1, unread input bytes: 0 Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 2 Local host: 4.4.4.4, Local port: 14912 Foreign host: 3.3.3.3, Foreign port: 179 Enqueued packets for retransmit: 0, input: 0 Event Timers (current time is 0x140B8DC0): Timer Starts Wakeups Retrans 1082 0 TimeWait 0 0 AckHold 1079 1038 SendWnd 0 0 KeepAlive 0 0 GiveUp 0 0 PmtuAger 0 0 DeadWait 0 0 iss: 507514249 irs: 1009588969 snduna: 507536397 rcvnxt: 1009610970 mis-ordered: 0 (0 bytes) Next 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 sndnxt: rcvwnd: 507536397 16023 sndwnd: delrcvwnd: 16023 361 SRTT: 300 ms, RTTO: 303 ms, RTV: 3 ms, KRTT: 0 ms minRTT: 12 ms, maxRTT: 300 ms, ACK hold: 200 ms Flags: active open, nagle IP Precedence value : 6 Datagrams (max data segment is 536 bytes): Rcvd: 1121 (out of order: 0), with data: 1079, total data bytes: 22000 Sent: 2160 (retransmit: 0, fastretransmit: 0, partialack: 0, Second Congestion: 0), with data: 1081, total data bytes: 22147 The BGP Table or BGP RIB contains all routes injected into BGP on the local BGP speaker, as well as those received from internal and external peers. The information stored in the BGP Table or RIB includes NEXT_HOP information, AS_PATH information, and MED information, among other parameters. Following is a sample output of the BGP RIB that can be viewed using the show ip bgp command: R3#show ip bgp BGP table version is 136, local router ID is 3.3.3.3 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 277 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete *> *> *> *> *> *> Network 150.2.0.0/24 150.3.0.0/24 150.4.0.0/24 150.5.0.0/24 150.9.0.0/24 160.0.0.0/24 Next Hop 4.4.4.4 4.4.4.4 4.4.4.4 4.4.4.4 4.4.4.4 4.4.4.4 Metric LocPrf Weight Path 0 0 4 5 6 7 ? 0 0 4 5 6 7 ? 0 0 4 8 9 1 ? 0 0 4 8 9 1 ? 0 0 4 6 7 8 ? 0 0 4 6 7 8 ? Border Gateway Protocol Messages All Border Gateway Protocol messages share a common header that is 19-bytes long. Only four BGP messages are available as follows: 1. The OPEN message 2. The UPDATE message 3. The NOTIFICATION message 4. The KEEPALIVE message The OPEN message, BGP message Type 1, is the first packet BGP sends to a peer after the TCP connection has been established. It allows the two peers to negotiate the parameters of the peer session. The different parameters include the BGP version, the hold time value for the session, authentication data, refresh capabilities, and support for multiple NLRI. If the OPEN message is acceptable, a KEEPALIVE message confirming the OPEN message is sent back. Once the OPEN message is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION messages may be exchanged. The OPEN message includes parameters such as autonomous system number, authentication information (MD5), if configured, the BGP router ID (RID), and the hold time value. The BGP UPDATE message (Type 2) is used to send and withdraw BGP routing information or NLRI. Additionally, the UPDATE message contains information previously advertised by the local router that is no longer valid, as well as new information that is being advertised to the remote peer. Each UPDATE message contains a single set of BGP attributes and all of the routes using those attributes. The format of this message reduces the total number of packets that routers must send between the BGP peers when exchanging NLRI. The BGP NOTIFICATION message (Type 3) is sent when an error condition is detected. When a BGP peer detects an error within the session, it sends a NOTIFICATION message to the remote router and immediately closes both the BGP and TCP sessions. The minimum length of the NOTIFICATION message is 21 bytes, including the header. Within the NOTIFICATION message, the 278 C H A P T E R 7: T RO U B L ES H O OT I N G B G P 1-byte Error Code field specifies the type of BGP error seen by the local router. Six error codes have been defined as follows: • Error Code 1: Message Header Error • Error Code 2: OPEN Message Error • Error Code 3: UPDATE Message Error • Error Code 4: Hold Time Expired • Error Code 5: BGP Finite State Machine Error • Error Code 6: Cease The KEEPALIVE message (Type 4) is used to determine whether a host or a link has failed. This message type contains a single 19-byte header and no other data. By default, BGP peers exchange KEEPALIVE messages every 60 seconds. In addition, by default, the hold time value used by BGP is three times the value of the keepalive interval. The advertisement of an UPDATE message within the keepalive period resets the timer to 0. In other words, the KEEPALIVE message is sent only in the absence of other messages for a particular session. If the local router does not receive a KEEPALIVE or UPDATE message within the hold time period, a NOTIFICATION message of ‘Hold Time Expired’ is generated and the session is torn down. Establishing Border Gateway Protocol Adjacencies Because BGP is unique in that it uses TCP as the underlying protocol, the process of establishing a neighbor relationship is two-fold: the first phase is the establishment of the TCP session, and the second phase is the establishment of the BGP peer session. RFC 1771 includes a section on the BGP Finite State Machine (FSM). The FSM includes an overview of BGP operations by state. The different states BGP will go through before a neighbor relationship is established are as follows: • The Idle state • The Connect state • The Active state • The OpenSent state • The OpenConfirm state • The Established state The first three states pertain to the establishment of the underlying TCP connection between the BGP speakers. The second three states pertain to the establishment of the actual BGP session. The show ip bgp summary or the show ip bgp neighbors commands can be used to view some, not all, of these states when BGP is enabled on Cisco IOS software routers and switches. 279 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The Idle state is the initial BGP state after BGP is enabled on a router, or when a router is reset. No BGP resources are allocated to the peer in this state. Additionally, when in this state, no incoming connections are allowed. Following is the output of the show ip bgp neighbors command immediately after BGP has been enabled and a neighbor has been defined. R2#show ip bgp neighbors BGP neighbor is 10.0.1.1, remote AS 1, external link BGP version 4, remote router ID 0.0.0.0 BGP state = Idle Last read 00:00:00, last write 00:00:00, hold time is 180, keepalive interval is 60 seconds ... [Truncated Output] If using the show ip bgp summary command, the same state would be seen as follows: R2#show ip bgp summary BGP router identifier 2.2.2.2, local AS number 2 BGP table version is 1, main routing table version 1 Neighbor 10.0.1.1 V 4 AS MsgRcvd MsgSent 1 0 0 TblVer 0 InQ OutQ Up/Down 0 0 never State/PfxRcd Idle In the Connect state, BGP waits for a TCP connection to be completed. If successful, the local router will send an OPEN message to the peer and the BGP state machine transitions to the OpenSent state. However, if the TCP connection attempt fails, the local router resets the ConnectRetry timer and transitions to the Active state. In addition, depending on the failure condition, the local router could also revert back to the Idle state. Additionally, if the ConnectRetry timer reaches 0 while the local router is in the Connect state, the timer is reset and another connection attempt is made. In this case, the local router remains in the Connect state. In the Active state, a TCP connection is initiated to establish a BGP neighbor relationship, also referred to as a BGP peer relationship. In plain English, the BGP routing process tries to establish a TCP session with the peer. If the session establishes successfully, an OPEN message is sent to the peer, the hold time is set to a large value, and the local router transitions to the OpenSent state. However, if the TCP session fails to establish, the local router initiates another session, sets the ConnectRetry timer to 0, and transitions back to the Connect state. During this state, if the remote peer attempts to establish a connection to the local router using an unexpected IP address for the session, the local router will refuse the connection. The local router will remain in the Active state and reset the ConnectRetry timer. If any other failures occur, the local router releases all BGP resources associated with the connection and transitions back to the Idle state. Following is the output of the show ip bgp neighbors command following the completion of the Connect state: 280 C H A P T E R 7: T RO U B L ES H O OT I N G B G P R2#show ip bgp neighbors BGP neighbor is 10.0.1.1, remote AS 1, external link BGP version 4, remote router ID 0.0.0.0 BGP state = Active Last read 00:00:01, last write 00:00:01, hold time is 180, keepalive interval is 60 seconds ... [Truncated Output] If using the show ip bgp summary command, the same state would be seen as follows: R2#show ip bgp summary BGP router identifier 2.2.2.2, local AS number 2 BGP table version is 1, main routing table version 1 Neighbor 10.0.1.1 V 4 AS MsgRcvd MsgSent 1 215 215 TblVer 0 InQ OutQ Up/Down State/PfxRcd 0 0 00:00:01 Active After sending the OPEN message to the peer, BGP then transitions to the OpenSent state. In this state, the local router waits for a response to the sent OPEN message. When an OPEN message is received, all fields in the message are checked. If an error is detected, the local router will send the peer a NOTIFICATION message and transition back to the Idle state. However, if a successful (i.e., error-free response) is received, the BGP state moves to OpenConfirm and BGP sends a KEEPALIVE message and sets a keepalive timer. Additionally, the previously large hold time value set in the Active state is replaced with the new negotiated hold time value as the BGP peers negotiate and agree on parameters for the session. Finally, if a TCP disconnect is received while in this state, the local router terminates the BGP session, resets the ConnectRetry timer, and transitions back to the Active state. In the OpenConfirm state, BGP waits for a KEEPALIVE or a NOTIFICATION message. If the local router receives a KEEPALIVE message, it transitions to the Established state. However, if a KEEPALIVE message is not received before the negotiated hold time value expires, the local router will send a NOTIFICATION message to the peer with the error code ‘Hold Time Expired’ and transition to the Idle state. Additionally, if the local router receives a NOTIFICATION message from its peer, it also will transition immediately to the Idle state. In this state, if any other failure is detected, the local router sends a NOTIFICATION message with Error Code 5, ‘BGP Finite State Machine Error,’ and transitions back to the Idle state. BGP goes through the same steps again to try to establish the connection. The final state, the Established state, is reached when the initial KEEPALIVE message is received while BGP is in the OpenConfirm state. This is the final state of a peer relationship and designates a 281 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I fully operational connection. Two BGP peers can exchange routing information only when the Established state is reached. In the Established state BGP can exchange UPDATE, NOTIFICATION, and KEEPALIVE messages with its peer. The Established state can be validated using the show ip bgp neighbors command as illustrated in the output below: R2#show ip bgp neighbors BGP neighbor is 10.0.0.1, remote AS 1, external link BGP version 4, remote router ID 1.1.1.1 BGP state = Established, up for 00:12:17 Last read 00:00:17, last write 00:00:17, hold time is 180, keepalive interval is 60 seconds Unlike the show ip bgp neighbors command, the show ip bgp summary command will not show the Established state, which will not be listed but the number of prefixes received from the peer(s) will be listed instead. If no prefixes are received from the peers, a value of 0 will be present. This is illustrated in the following output on a BGP speaker with multiple peers or neighbors in the Established state: R2#show ip bgp summary BGP router identifier 2.2.2.2, local AS number 2 BGP table version is 3, main routing table version 3 2 network entries using 234 bytes of memory 3 path entries using 156 bytes of memory 3/2 BGP path/bestpath attribute entries using 372 bytes of memory 1 BGP AS-PATH entries using 24 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 786 total bytes of memory BGP activity 3/1 prefixes, 4/1 paths, scan interval 60 secs Neighbor 10.0.0.1 10.0.1.1 10.0.3.3 V 4 4 4 AS MsgRcvd MsgSent 1 87 88 1 90 94 3 19 23 TblVer 3 3 3 InQ OutQ Up/Down State/PfxRcd 0 0 00:15:05 1 0 0 00:15:19 1 0 0 00:07:25 0 Border Gateway Protocol A ributes Border Gateway Protocol path attributes fall into the following four categories: 1. Well-known mandatory 2. Well-known discretionary 3. Optional transitive 4. Optional non-transitive All BGP speakers must recognize all of the well-known mandatory attributes, which must be included for all prefixes. However, discretionary attributes may or may not be included for a particu- 282 C H A P T E R 7: T RO U B L ES H O OT I N G B G P lar prefix. Discretionary attributes may be used based on the decision of the network administrator; however, their use is not mandatory. BGP speakers do not have to understand optional attributes but must re-advertise them based on their transitive setting. Transitive attributes are advertised to all BGP peers, while non-transitive attributes may be discarded if the local router does not recognize them. Although BGP does support multiple path attributes, this guide will discuss only the following: • ORIGIN • AS_PATH • NEXT_HOP • MED • LOCAL_PREF • WEIGHT NOTE: Additional detailed information on these and other BGP attributes may be found in the current ROUTEstudy guide, which is presently available online. The ORIGIN attribute is a well-known mandatory attribute. The ORIGIN attribute is generated by the autonomous system that originates the routing information. This attribute is defined automatically when a route or prefix is injected into BGP but may be modified using a route-map in Cisco IOS software. There are three possible ORIGIN values, which are as follows: 1. IGP 2. EGP 3. INCOMPLETE An ORIGIN of IGP indicates that the prefix was injected into BGP using the network command in Cisco IOS software. Prefixes with this ORIGIN code are displayed with the letter ‘i’. These routes are internal to the originating AS. An ORIGIN code of EGP indicates that the prefix originated from the Exterior Gateway Protocol (EGP). Prefixes with this ORIGIN code are displayed with the letter ‘e’ and are encoded with a value of 1. EGP is beyond the scope of the TSHOOT certification exam and is not described in this guide. Finally, an ORGIN code of INCOMPLETE indicates that the original source or the prefix is not known to the router injecting the route into BGP. This code is used for prefixes that are redistributed into BGP using the redistribute command. In Cisco IOS software, prefixes with the INCOMPLETE attribute code are displayed with the question mark (?). 283 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The AS_PATH attribute is used to prevent routing information loops in inter-AS BGP (i.e., eBGP) implementations. The AS_PATH attribute contains a reverse-order-sequenced list of autonomous system numbers that represent the domains the prefix has transited. This attribute is changed only when an UPDATE message is sent to an eBGP neighbor, but not when sent to an iBGP peer, hence, the reason that this attribute is applicable only for external BGP implementations. When an eBGP speaker receives UPDATE messages, it looks at the AS_PATH list to determine the best (shortest) path to the destination prefix. However, if the eBGP speaker notices its autonomous system number in the AS_PATH list, it ignores this UPDATE message and does not take it into consideration in the path selection process. This prevents the router from receiving and accepting information that it originated, or was originated within its own autonomous system. The NEXT_HOP attribute, Attribute Type Code 3, is a well-known mandatory attribute that is used to define the next-hop IP address to the destination prefix from the BGP perspective. Unlike with traditional IGPs, the next-hop for a BGP prefix does not have to be directly connected. In such cases, the local router performs a recursive lookup in the routing table to locate a route to the BGP next-hop. The result of this recursive lookup is the physical next-hop assigned to the BGP route in the routing and forwarding tables. There are variable ways in which the NEXT_HOP for a prefix is determined and set, which include the following: • When the prefix is first injected into BGP • When the prefix is advertised via eBGP • When the next-hop is manually changed When a prefix is first injected into BGP, the BGP speaker on which the prefix is injected will be responsible for setting the NEXT_HOP attribute. The actual value (i.e., the actual IP address specified) depends on how the prefix is injected into BGP as follows: • If the prefix is injected into BGP using the network command and the prefix is a directly connected subnet, the NEXT_HOP address will be 0.0.0.0 on the local BGP speaker. • If a prefix is injected into BGP using the network command and that prefix is known via an IGP, the NEXT_HOP will contain the IP address of the IGP next-hop router. • If the prefix is redistributed into BGP from an IGP using the redistribute command, the NEXT_HOP address will be set to the same next-hop address of the IGP. • When BGP route summarization is configured using the aggregate-address router con- 284 C H A P T E R 7: T RO U B L ES H O OT I N G B G P figuration command, the NEXT_HOP is set to the address of the router that is performing the summarization when an UPDATE message is sent. If summarization is performed on the local BGP speaker, then the NEXT_HOP address will be 0.0.0.0. • When the prefix is advertised via eBGP, the NEXT_HOP will automatically be set to the IP address of the eBGP speaker that is sending the UPDATE message for the prefix. • If more than two eBGP peers reside on the same multi-access segment, the BGP speaker that is advertising the prefix sets the NEXT_HOP address in the UPDATE message to the original BGP speaker on the same segment rather than to itself. • By default, the NEXT_HOP is not changed when a prefix is advertised by a BGP speaker to an iBGP peer. This can be modified using the next-hop-self command. The MULTI_EXIT_DISC (MED) attribute is a 32-bit positive integer that is defined as Attribute Type Code 4. In addition, MED is an optional non-transitive attribute. MED is typically used on inter-AS links and allows BGP to choose among multiple exit points to the same neighboring AS. In other words, MED, which is expressed as a metric value, is used as a suggestion to the peer external AS regarding the preferred route into the local AS that is advertising the metric. The term ‘suggestion’ is used because it is not mandatory for the neighboring AS to adhere to the values specified using this attribute. The following section describes the MED rules when using Border Gateway Protocol: • If the prefix is for a directly connected network and the network or redistribute command is used to inject the prefix into BGP, the BGP MED value is set to 0. • If the prefix is received from an IGP, when it is injected into BGP using the network or redistribute command, the MED value is set to the IGP metric. • If the prefix is injected into BGP using the aggregate-address command (i.e., it is a summary prefix), the MED value is not set. • A BGP speaker will advertise prefixes with the same metric to another iBGP peer. • By default, if the prefix is learned from an iBGP peer, the edge router will remove the MED value before advertising the prefix to an eBGP peer. The LOCAL_PREF attribute is a 32-bit positive integer that defines a preference over one exit point in an AS. This attribute is a well-known discretionary BGP attribute, defined as Attribute 285 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Type Code 5. The LOCAL_PREF attribute is used only within an AS for path selection manipulation. If an iBGP speaker receives an UPDATE for the same destination from multiple iBGP peers, it will prefer the path with the highest LOCAL_PREF value. In Cisco IOS software, the default LOCAL_PREF value is 100; however, this value can be changed to any value between 0 and 4294967295. The WEIGHT attribute is used in a manner similar to the LOCAL_PREF attribute. That is, this attribute is used to define a preference over one exit point in an AS over another. However, unlike the LOCAL_PREF attribute, which is propagated to other routers in the AS, the Cisco WEIGHT attribute is locally significant to the device on which it is configured – much like administrative distance values. This attribute, therefore, does not propagate the routing policy of other neighbor routers, nor will it be sent to other routers. The WEIGHT is a 4-byte integer between 0 and 65,535. While this attribute is proprietary, it has priority over all other attributes in the BGP path selection process. By default, all prefixes injected into BGP on the local BGP speaker are assigned a WEIGHT of 32,768. Because this attribute is locally significant, this default value is not set in the UPDATE messages sent to any other neighbors. In Cisco IOS software, the higher the WEIGHT, the more preferred the path will be. Influencing Inbound Path Selec on When using BGP attributes to influence inbound path selection, BGP attribute configuration must be implemented in the outbound direction. The attributes that are used to influence the path a neighboring autonomous system uses to take back into the originating autonomous system must be advertised in UPDATE messages sent to that neighboring autonomous system. The two BGP attributes that are used to influence the inbound path are as follows: • The MULTI_EXIT_DISC attribute • The AS_PATH attribute Two attributes are used to influence the path that BGP speakers within the same autonomous system will take to exit the autonomous system. Unlike the attributes used to influence inbound path selection, the attributes used to influence outbound path selection are applied in the inbound direction, but also using route maps. These two attributes are as follows: • The LOCAL_PREF attribute • The WEIGHT attribute Having recapped the BGP fundamentals, which are also described in additional detail in the ROUTE study guide, the following section describes some common BGP problems and provides 286 C H A P T E R 7: T RO U B L ES H O OT I N G B G P recommended solutions for resolving them. Keep in mind that it is not possible to go through all potential BGP problems. Instead, emphasis will be placed only on the most common ones. TROUBLESHOOTING NEIGHBOR RELATIONSHIPS The most common reason for failing to establish either eBGP or iBGP neighbor relationships are due to misconfigurations. However, in addition to misconfigurations, the following factors can also prevent BGP adjacencies from establishing: • Access Control Lists filtering BGP packets • Layer 1 and Layer 2 issues • Device resource consumption As with other routing protocols, ACLs can also prevent BGP neighbor adjacencies from being established. Given that BGP allows both internal and external peers to be more than one hop away, when configuring BGP between devices that are more than a single hop away, be sure that no ACLs on any intermediate devices are preventing TCP packets. If any such ACLs exist, they should be modified, using TCP port 179, to allow the BGP session to be established between the desired peers. You can verify applied ACLs using the show ip access-lists or show running-config commands on local and intermediate devices. Layer 1 and Layer 2 issues can also prevent BGP adjacencies from being established. You can check interface errors using the show interfaces and show counters interface commands. Layer 2 issues, such as VLANs and STP, can be validated following the sequence of steps described in the previous chapters. Commands that can be used to troubleshoot such issues include the show vlan and show spanning-tree suite of commands. Finally, it is important to remember that BGP itself is designed to support a large amount of prefixes, which in turn equates to greater resource (e.g., memory) consumption than traditional routing protocols. Additionally, the more peers or neighbors that are configured, the more memory BGP will consume. If device resources are already over-taxed, it is possible that some BGP adjacencies may not be established. Use the show processes suite of commands to troubleshoot device resource utilization if you suspect that this may be preventing adjacencies. In addition to the potential causes described above, common misconfigurations include the following: • Using the incorrect autonomous system number for the peer • Using the incorrect IP address for the peer 287 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Incorrect authentication parameters • No IP connectivity between indirectly connected peers Using the Incorrect Autonomous System Number for the Peer A common BGP misconfiguration is specifying the incorrect autonomous system number for the peer. If the neighbor [address] remote-as [autonomous system] command does not match the autonomous system number configured on the remote peer using the router bgp [autonomous system] command, the adjacency will not be established. Instead, Cisco IOS software will print the following error message on the console: *Mar 12 16:58:29.991: %BGP-3-NOTIFICATION: sent to neighbor 10.0.0.1 2/2 (peer in wrong AS) 2 bytes 0001 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 0104 0001 00B4 0101 0101 1002 0601 0400 0100 0102 0280 0002 0202 00 On the remote router, the following error message will be printed on the console: *Mar 19 04:58:09.535: %BGP-3-NOTIFICATION: received from neighbor 10.0.0.3 2/2 (peer in wrong AS) 2 bytes 0001 In such cases, the BGP speakers will remain in the Active state, indicating that that they are both actively trying to establish a TCP session with the specified peer. This issue can be resolved by specifying the correct autonomous system number. Using the Incorrect IP Address for the Peer While specifying an incorrect autonomous system number for the peer will result in error messages being printed on the console, if you specify the incorrect IP address, no error messages will be printed. It is important to ensure that you double-check that the IP address that has been specified for the peer is indeed the IP address of that device. If the peers are indirectly connected (i.e., will be peering using Loopback or other interfaces), then it is important to ensure that you specify the correct IP address. This address should match the address of the interface used in the update-source [interface] configuration command. In the event that these parameters are mismatched, the TCP session will never be established. Incorrect Authen ca on Parameters Border Gateway Protocol supports Message Digest 5 (MD5) authentication, which is used to secure or verify the security of the TCP segments between two BGP peers. When the BGP TCP MD5 shared password is configured between two peers, the Cisco IOS software checks the MD5 digest of every segment sent on the TCP connection. If MD5 authentication is invoked and a segment fails authentication, then an error message will be displayed. The type of message printed varies 288 C H A P T E R 7: T RO U B L ES H O OT I N G B G P depending on whether authentication is enabled on only one of the peers or whether both peers are configured for authentication but are using different passwords. In the event that authentication is configured on the local BGP speaker but not the peer, the following error message will be printed on the console of the local BGP speaker: *Mar 12 17:06:27.235: %TCP-6-BADAUTH: No MD5 digest from 1.0.0.1(179) to 1.0.0.3(46132) *Mar 12 17:06:27.239: %TCP-6-BADAUTH: No MD5 digest from 1.0.0.1(179) to 1.0.0.3(46132) However, if authentication is configured on both peers, but the passwords are mismatched the following error message will be printed: *Mar 12 17:08:56.243: %TCP-6-BADAUTH: Invalid MD5 digest from 1.0.0.1(52991) to 1.0.0.3(179) *Mar 12 17:09:04.243: %TCP-6-BADAUTH: Invalid MD5 digest from 1.0.0.1(52991) to 1.0.0.3(179) In order to resolve this issue, the same password must be used when configuring authentication between the BGP peers. This is applicable to both internal and external BGP peers. No IP Connec vity between Indirectly Connected Peers While configuring BGP between directly connected peers is a straightforward task, additional configuration is required to ensure that BGP adjacencies between indirectly configured peers are established. Consider the basic network topology that is illustrated in Figure 7-1 below, for example: Lo0: 3.3.3.3/32 AS 3 R3 Lo0: 4.4.4.4/32 Se1/3 Se0/0 Se1/3 Se0/1 R4 AS 4 Fig. 7-1. Understanding BGP Multihop Implementation Referencing Figure 7-1, external BGP is to be configured between R3 and R4 using the Loopback interfaces of either router for peering, allowing for load balancing across the physical links. When implementing BGP in such situations, several additional configuration commands are required. The first requirement is the use of the ebgp-multihop command. By default, external BGP packets are sent out with an IP TTL of 1. The ebgp-multihop command allows administrators to modify this default behavior and specify the packet TTL value. If a value is not specified, the default TTL of 255 will be used instead. 289 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The second requirement is ensuring that there is IP connectivity between the Loopback interfaces. This may be performed using either dynamic or static routes; with the latter being the most commonly used method. Referencing the topology illustrated in Figure 7-1, two static routes can be configured between the routers across the Serial0/0 interfaces using the ip route [remote loopback address] [mask] serial [name/number] global configuration command. The third requirement is specifying the update source. By default, BGP expects that the update source will be an IP address on a directly connected subnet. If the specified neighbor address is indirectly connected (e.g., a Loopback), then the update-source [interface] BGP configuration command must also be specified on both speakers. If any of these parameters are not specified, the adjacency will not be established. This can be validated using the show ip bgp neighbors [address] command as follows: R4#show ip bgp neighbors 3.3.3.3 BGP neighbor is 3.3.3.3, remote AS 3, external link BGP version 4, remote router ID 0.0.0.0 BGP state = Active Last read 00:00:14, last write 00:00:14, hold time is 180, keepalive interval is 60 seconds Message statistics: InQ depth is 0 OutQ depth is 0 Sent Rcvd Opens: 1 1 Notifications: 0 0 Updates: 0 0 Keepalives: 16 16 Route Refresh: 0 0 Total: 17 17 Default minimum time between advertisement runs is 30 seconds For address family: IPv4 Unicast BGP table version 1, neighbor version 0/0 Output queue size : 0 Index 1, Offset 0, Mask 0x2 1 update-group member Sent Rcvd Prefix activity: ------Prefixes Current: 0 0 Prefixes Total: 0 0 Implicit Withdraw: 0 0 Explicit Withdraw: 0 0 Used as bestpath: n/a 0 Used as multipath: n/a 0 Local Policy Denied Prefixes: Total: Outbound -------0 290 Inbound ------0 C H A P T E R 7: T RO U B L ES H O OT I N G B G P Number of NLRIs in the update sent: max 0, min 0 Connections established 1; dropped 1 Last reset 00:00:20, due to User reset External BGP neighbor may be up to 2 hops away. No active TCP connection From the output printed above, we can determine that the ebgp-multihop 2 command has been issued on the local router; however, the BGP session is not established. This leaves two possibilities: either the update-source command is missing or there is no IP connectivity between the two Loopback addresses. You can troubleshoot this issue simply by verifying the device configurations on both routers. While Cisco IOS software assumes that external BGP peers are directly connected, the same is not assumed for internal BGP peers. This negates the need for the multihop command when configuring internal BGP peers. However, the peer IP addresses still must have connectivity and the update-source command still must be used. If this command is not specified, the BGP session will never be established, as can be seen in the following output: R4#show ip bgp neighbors 3.3.3.3 BGP neighbor is 3.3.3.3, remote AS 3, internal link BGP version 4, remote router ID 0.0.0.0 BGP state = Active Last read 00:02:50, last write 00:02:50, hold time is 180, keepalive interval is 60 seconds Message statistics: InQ depth is 0 OutQ depth is 0 Sent Rcvd Opens: 0 0 Notifications: 0 0 Updates: 0 0 Keepalives: 0 0 Route Refresh: 0 0 Total: 0 0 Default minimum time between advertisement runs is 0 seconds For address family: IPv4 Unicast BGP table version 1, neighbor version 0/0 Output queue size : 0 Index 1, Offset 0, Mask 0x2 1 update-group member Sent Rcvd Prefix activity: ------Prefixes Current: 0 0 Prefixes Total: 0 0 Implicit Withdraw: 0 0 Explicit Withdraw: 0 0 291 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Used as bestpath: Used as multipath: n/a n/a 0 0 Outbound Inbound Local Policy Denied Prefixes: -------------Total: 0 0 Number of NLRIs in the update sent: max 0, min 0 Connections established 0; dropped 0 Last reset never No active TCP connection TROUBLESHOOTING ROUTE ADVERTISEMENT Similar to neighbor adjacency establishment, route advertisement issues are commonly due to simple device misconfigurations. In Cisco IOS software, three methods can be used to advertise networks when using Border Gateway Protocol. These methods include using the network [network] mask [mask] command, using the aggregate-address command, and using the redistribute [protocol] command. The first two methods are discussed in this main section; however, redistribution will be discussed in the following main section. In addition, this section will also discuss route advertisement following BGP policy changes. Route Adver sement with the ‘network’ Command The network [network] mask [mask] command is the recommended method for advertising networks when using BGP. When specified with BGP, the behavior of this command differs from when it is used with an IGP, such as OSPF or EIGRP. With an IGP, this command configures the router to install the network into the Link State Database or Topology Table and send out Hello packets to discover neighbor routers. With BGP, however, this command is used to flag the network as being local to the autonomous system, as well as to instruct BGP to advertise the specified network. It does not configure the router to send out Hello packets out of any interfaces that fall within the specified range. The specified network must be present in the routing table before it will be advertised by BGP. The mask <mask> keyword is optional and is required only when BGP is required to advertise either subnets or supernets. For example, to advertise the 10.0.0.0/30 subnet, the mask keyword would be required. Excluding the mask keyword and simply entering the network 10.0.0.0 command would not result in this subnet being advertised. However, the mask keyword would not be required to advertise the 10.0.0.0/8 network, as long as there was a matching route for this prefix in the routing table. To clarify this concept further, consider the topology illustrated in Figure 7-2 below: 292 C H A P T E R 7: T RO U B L ES H O OT I N G B G P Lo0: 3.3.3.3/32 AS 3 R3 Lo0: 4.4.4.4/32 Se1/3 Se0/0 Se1/3 Se0/1 R4 AS 4 10.4.4.4/24 Fig. 7-2. Understanding the BGP ‘network’ Command Referencing Figure 7-2, an external BGP session has been configured between R3 and R4. We will assume that the correct configuration is in place on both routers, allowing the session to be established successfully. R4 has a directly connected 10.4.4.0/24 subnet. This is to be advertised via BGP to R3. At this point, the relevant BGP configuration on R4 is as follows: R4#show running-config | section bgp router bgp 4 no synchronization bgp router-id 4.4.4.4 bgp log-neighbor-changes network 10.0.0.0 neighbor 3.3.3.3 remote-as 3 neighbor 3.3.3.3 ebgp-multihop 2 neighbor 3.3.3.3 update-source Loopback0 no auto-summary At first glance, the configuration appears to be correct because the 10.4.4.0/24 subnet is encompassed by the 10.0.0.0/8 Classful network, as can be seen in the output of the routing table below: R4#show ip route 10.0.0.0 longer-prefixes Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is not set C 10.0.0.0/24 is subnetted, 1 subnets 10.4.4.0 is directly connected, FastEthernet0/0 However, because the operation of the network command differs for BGP, the 10.4.4.0/24 subnet will not be added to the BGP RIB as illustrated in the following output: R4#show ip bgp 10.0.0.0 293 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I % Network not in table R4# R4#show ip bgp R4# In order to have BGP advertise the 10.4.4.0/24 subnet, the correct mask must be specified as follows: R4(config)#router bgp 4 R4(config-router)#no network 10.0.0.0 R4(config-router)#network 10.4.4.0 mask 255.255.255.0 R4(config-router)#exit Following this re-configuration, the 10.4.4.0/24 subnet appears in the BGP RIB as follows: R4#show ip bgp BGP table version is 2, local router ID is 4.4.4.4 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network *> 10.4.4.0/24 Next Hop 0.0.0.0 Metric LocPrf Weight Path 0 32768 i We can further validate that the prefix is advertised to neighbor 3.3.3.3 using either the show ip bgp [network] [mask] or the show ip bgp neighbors [address] advertised-routes com- mands. Following is the output of the show ip bgp [network] [mask]command on R4: R4#show ip bgp 10.4.4.0 255.255.255.0 BGP routing table entry for 10.4.4.0/24, version 2 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 Local 0.0.0.0 from 0.0.0.0 (4.4.4.4) Origin IGP, metric 0, localpref 100, weight 32768, valid, sourced, local, best NOTE: To verify neighbors in update-group 1, use the show ip bgp update-group com- mand as follows: R4#show ip bgp update-group 1 BGP version 4 update-group 1, external, Address Family: IPv4 Unicast BGP Update version : 2/0, messages 0 Update messages formatted 1, replicated 0 294 C H A P T E R 7: T RO U B L ES H O OT I N G B G P Number of NLRIs in the update sent: max 1, min 1 Minimum time between advertisement runs is 30 seconds Has 1 member (* indicates the members currently being sent updates): 3.3.3.3 Alternatively, simply use the show ip bgp neighbors [address] advertised-routes command as previously stated. Following is the output of this command on R4: R4#show ip bgp neighbors 3.3.3.3 advertised-routes BGP table version is 2, local router ID is 4.4.4.4 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network *> 10.4.4.0/24 Next Hop 0.0.0.0 Metric LocPrf Weight Path 0 32768 i Total number of prefixes 1 Route Adver sement with the ‘aggregate-address’ Command Two common problem scenarios are often encountered when using the aggregate-address command with BGP. The first is that BGP does not advertise the aggregate at all and the second is that BGP advertises the summary and the more specific routes as well. In order to troubleshoot either issue, you must have a solid understanding of how route aggregation with BGP works. By default, when the aggregate-address command is used, BGP advertises the summary only if a more specific route encompassed by the summary is present in the routing table. Continuing from the topology illustrated in Figure 7-2, assume the following configuration was implemented on R4: R4(config)#router bgp 4 R4(config-router)#aggregate-address 10.0.0.0 255.0.0.0 R4(config-router)#end With this configuration, and assuming that the 10.4.4.0/24 subnet is still present in the RIB, BGP will generate and advertise the 10.0.0.0/8 summary address. This can be validated by checking the BGP RIB using the show ip bgp command as follows: R4#show ip bgp BGP table version is 3, local router ID is 4.4.4.4 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network *> 10.0.0.0 *> 10.4.4.0/24 Next Hop 0.0.0.0 0.0.0.0 Metric LocPrf Weight Path 32768 i 0 32768 i 295 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I NOTE: You can view detailed information on the aggregate by appending the network command and/or mask keyword to the end of this command as follows: R4#show ip bgp 10.0.0.0 255.0.0.0 BGP routing table entry for 10.0.0.0/8, version 7 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 Local, (aggregated by 4 4.4.4.4) 0.0.0.0 from 0.0.0.0 (4.4.4.4) Origin IGP, localpref 100, weight 32768, valid, aggregated, local, atomicaggregate, best In the event that all specific routes included in the aggregate are removed or withdrawn from the routing table, the aggregate will no longer be advertised. For example, if the 10.2.2.0/24 subnet was removed from the routing table by shutting down the FastEthernet0/0 interface on R4, then the aggregate would also be withdrawn and would not be advertised. This is illustrated in the output below: R4#debug ip bgp BGP debugging is on for address family: IPv4 Unicast R4#config t Enter configuration commands, one per line. End with CNTL/Z. R4(config)#interface FastEthernet0/0 R4(config-if)#shutdown R4(config-if)# *Mar 12 20:45:16.455: BGP(0): Aggregate processing for IPv4 Unicast *Mar 12 20:45:16.455: BGP(0): For aggregate 10.0.0.0/8 *Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has an entry 10.0.0.0/8 *Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has another entry 10.4.4.0/24 *Mar 12 20:45:16.455: BGP(0): sub-prefix : 10.4.4.0/24Needs to be re-aggregated *Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has an entry 10.0.0.0/8 *Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has another entry 10.4.4.0/24 *Mar 12 20:45:16.459: BGP(0): 10.0.0.0/8 aggregate is removed *Mar 12 20:45:16.459: BGP(0): Aggregate 10.0.0.0/8 does not have more-specifics As previously stated, by default when summarization is configured, BGP will advertise both the summary and the more specific prefixes. Continuing with the previous aggregation configuration example, the 10.0.0.0/8 aggregate as well as the 10.2.2.0/24 subnet would be advertised by R4 to peer router R3. Again, this can be validated using the show ip bgp neighbors [address] advertised-routes command as follows: R4#show ip bgp neighbors 3.3.3.3 advertised-routes BGP table version is 15, local router ID is 4.4.4.4 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete 296 C H A P T E R 7: T RO U B L ES H O OT I N G B G P Network *> 10.0.0.0 *> 10.4.4.0/24 Next Hop 0.0.0.0 0.0.0.0 Metric LocPrf Weight Path 32768 i 0 32768 i Total number of prefixes 2 In order to suppress specific prefixes from being advertised, you must configure BGP manually to do so by appending the summary-only keyword to the aggregate-address command as illustrated in the following output: R4(config)#router bgp 4 R4(config-router)#aggregate-address 10.0.0.0 255.0.0.0 summary-only R4(config-router)#end Following this configuration, any specific prefixes encompassed by the aggregate are preceded by an ‘s’, indicating that they are being explicitly suppressed as illustrated in the following output: R4#show ip bgp BGP table version is 16, local router ID is 4.4.4.4 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network *> 10.0.0.0 s> 10.4.4.0/24 Next Hop 0.0.0.0 0.0.0.0 Metric LocPrf Weight Path 32768 i 0 32768 i Given this configuration, only the 10.0.0.0/8 aggregate is now advertised to R4 as follows: R4#show ip bgp neighbors 3.3.3.3 advertised-routes BGP table version is 16, local router ID is 4.4.4.4 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network *> 10.0.0.0 Next Hop 0.0.0.0 Metric LocPrf Weight Path 32768 i Total number of prefixes 1 Route Adver sement Following BGP Policy Reconfigura on Another common issue with BGP is route advertisement following policy reconfiguration or changes. BGP policies include filtering, using tools such as route maps, ACLs, and AS_PATH filters. As was stated earlier in this chapter, the BGP Scanner process periodically scans the BGP Routing Information Base (RIB) in order to determine whether prefixes and attributes should be deleted and whether route maps or filter caches should be flushed. 297 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Additionally, the BGP Scanner walks the BGP table and confirms reachability of the next-hops (i.e., it validates that next-hops are still valid). If the next-hop for a prefix is not reachable, all BGP entries that use that next-hop are removed from the BGP RIB. By default, the BGP Scanner runs every 60 seconds, which means that it could take up to a minute, or more depending on the number of prefixes and other factors, such as resource utilization, for BGP policy changes to be detected and, ultimately, for networks to be advertised (or withdrawn). If you have made changes to BGP policy configuration (e.g., route maps) and you notice that the network or networks are not being advertised by BGP (assuming the configuration is correct), keep in mind that it simply may be that the BGP Scanner process has yet to incorporate the changes. Therefore, instead of waiting for the BGP Scanner process to run, use the clear ip bgp command to apply the configuration changes immediately. The complete syntax of this command is as follows: clear ip bgp [* | all | <autonomous-system-number> | <address> | peer-group <name>] [in [prefix-filter] | out | slow | soft [in [prefix-filter] | out | slow]] Table 7-1 below lists and describes the keywords that can be used with this command: Table 7-1. Cisco IOS ‘clear ip bgp’ Command Keywords Keyword Function * The asterisk (*) resets all BGP peers (i.e., it tears down and resets all BGP sessions). This should be used with extreme caution. all This optional keyword specifies the reset of all address family (AF) sessions (e.g., the ipv4 [IPv4 AF] and the ipv6 [IPv6 AF]). autonomousThis specifies the number of the autonomous system in which all BGP system-number peer sessions will be reset. address This specifies that only the identified BGP neighbor will be reset. The value for this argument can be either an IPv4 address or an IPv6 address. peer-group <name> This specifies that only the identified BGP peer group will be reset. in This optional keyword initiates inbound reconfiguration. If neither the in nor out keywords are specified, both inbound and outbound sessions are reset. prefix-filter This optional keyword clears the existing outbound route filter (ORF) prefix list to trigger a new route refresh or soft reconfiguration, which updates the ORF prefix list. out This optional keyword initiates outbound reconfiguration. If neither the in nor out keywords are specified, both inbound and outbound sessions are reset. 298 C H A P T E R 7: T RO U B L ES H O OT I N G B G P slow soft This optional keyword clears slow-peer status forcefully and moves it to the original update group. This optional keyword initiates a soft reset. In other words, using this keyword does not tear down the BGP session. TROUBLESHOOTING ROUTE REDISTRIBUTION ISSUES For the most part, the redistribution of routes into BGP is a straightforward process. As is the case with other routing protocols, this is performed using the redistribute [protocol] command. By default, BGP will simply use the IGP metric when external routing information is injected into BGP. This negates the need to specify a seed metric or assign a different route metric during redistribution. There are, however, some caveats that you should be familiar with when troubleshooting route advertisement following redistribution. One common issue is the advertisement of OSPF routes. By default, when OSPF is redistributed into BGP, only internal OSPF and external Type 1 routes will be redistributed and advertised. This is often a point of confusion because, by default, external OSPF routes in Cisco IOS software are external Type 2. You can verify which OSPF route types are being redistributed into BGP either by looking at the device configuration or by using the show ip route command, a sample output of which is provided below: R4#show ip route 192.168.1.0 255.255.255.0 Routing entry for 192.168.1.0/24 Known via “ospf 4”, distance 110, metric 20, type extern 2, forward metric 1 Redistributing via bgp 4 Last update from 10.0.0.1 on FastEthernet0/0, 00:10:23 ago Routing Descriptor Blocks: * 10.0.0.1, from 1.1.1.1, 00:10:23 ago, via FastEthernet0/0 Route metric is 20, traffic share count is 1 NOTE: Although the output above shows that the external Type 2 is being redistributed via BGP, the route is not imported into the BGP RIB because only external Type 1s will be redistributed by default. This can be validated using the show ip bgp command as follows: R4#show ip bgp 192.168.1.0 255.255.255.0 % Network not in table When redistributing OSPF routes into BGP, you must specify the OSPF route types that will be redistributed into BGP using the redistribute ospf [process ID] match <external 1|2> <internal> <nssa-external 1|2> router configuration command. Following this, you can again validate the implementation by looking at the router configuration or using the show ip proto- 299 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I cols command. For example, if BGP were configured to redistribute all internal and Type 2 exter- nals (Type 5 and Type 7), the show ip route command would display the following: R4#show ip route 192.168.1.0 255.255.255.0 Routing entry for 192.168.1.0/24 Known via “ospf 4”, distance 110, metric 20, type extern 2, forward metric 1 Redistributing via bgp 4 Advertised by bgp 4 match internal external 2 nssa-external 2 Last update from 10.0.0.1 on FastEthernet0/0, 00:13:14 ago Routing Descriptor Blocks: * 10.0.0.1, from 1.1.1.1, 00:13:14 ago, via FastEthernet0/0 Route metric is 20, traffic share count is 1 Because the route type is included in the redistribution configuration, the route is installed into the BGP RIB and, assuming no filtering configuration, will be advertised to neighbors: R4#show ip bgp 192.168.1.0 255.255.255.0 BGP routing table entry for 192.168.1.0/24, version 63 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 Local 10.0.0.1 from 0.0.0.0 (4.4.4.4) Origin incomplete, metric 20, localpref 100, weight 32768, valid, sourced, best NOTE: In the output above, notice that the BGP next-hop is set to 10.0.0.1, which is the OSPF next-hop address. In addition, the BGP metric for this prefix is the same as the OSPF metric. Another common redistribution problem pertains to the redistribution of BGP routes into an IGP, such as EIGRP or OSPF. By default, only external BGP routes are redistributed. This default behavior is used to avoid routing loops within the interior network. When redistributing BGP routes into an IGP, you must use the bgp redistribute-internal command to redistribute iBGP prefixes into the IGP. NOTE: The redistribution of BGP into IGPs is not recommended. However, if it must be performed, be sure to use route filters to allow only the explicit prefixes that should be redistributed into the IGP to be imported. Do not blindly redistribute BGP into any IGP. 300 C H A P T E R 7: T RO U B L ES H O OT I N G B G P DEBUGGING BGP ROUTING ISSUES Given that BGP is a resource-intensive protocol in itself, careful consideration should be given before enabling any kind of BGP debugging in production environments. BGP debugging is enabled using the debug ip bgp command. The keywords that can be used in conjunction with this command are illustrated below: R3#debug ip bgp ? A.B.C.D BGP neighbor address all All address families dampening BGP dampening events BGP events groups BGP Config (peer-groups, templates) and Update groups import BGP import routes to a vrf across address-family in BGP Inbound information ipv4 Address family ipv6 Address family keepalives BGP keepalives mpls BGP MPLS label distribution nsap Address family out BGP Outbound information rib-filter Next hop route watch filter events updates BGP updates vpnv4 Address family <cr> NOTE: The majority of these options are beyond the scope of the TSHOOT certification exam. Only those options that are relevant to this course are described below. The debug ip bgp [address] updates command will print detailed information about updates received from the specified BGP neighbor. If you wanted to see real-time information about updates from all BGP peers, you would use the debug ip bgp updates command instead. Following is a sample output of the debug ip bgp [address] updates command, which illustrates detailed information about updates received from the specified neighbor. This information includes NLRI, NEXT_HOP information, MED information, and AS_PATH information, among other things, as can be seen below: R3#debug ip bgp 4.4.4.4 updates BGP updates debugging is on for neighbor 4.4.4.4 for address family: IPv4 Unicast R3# R3# R3# *Mar 13 12:53:01.854: BGP(0): 4.4.4.4 send UPDATE (format) 150.4.0.0/24, next 3.3.3.3, metric 0, path 4 8 9 1 *Mar 13 12:53:01.854: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0) 150.5.0.0/24, next 3.3.3.3, metric 0, path 4 8 9 1 301 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I *Mar 13 12:53:01.854: BGP(0): 4.4.4.4 send UPDATE (format) 150.2.0.0/24, next 3.3.3.3, metric 0, path 4 5 6 7 *Mar 13 12:53:01.858: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0) 150.3.0.0/24, next 3.3.3.3, metric 0, path 4 5 6 7 *Mar 13 12:53:01.858: BGP(0): 4.4.4.4 send UPDATE (format) 160.0.0.0/24, next 3.3.3.3, metric 0, path 4 6 7 8 *Mar 13 12:53:01.858: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0) 150.9.0.0/24, next 3.3.3.3, metric 0, path 4 6 7 8 *Mar 13 12:53:01.878: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 4.4.4.4, origin ?, metric 0, path 4 6 7 8 *Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcvd 150.9.0.0/24...duplicate ignored *Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcvd 160.0.0.0/24...duplicate ignored *Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcv UPDATE w/ attr: nexthop 4.4.4.4, origin ?, metric 0, originator 0.0.0.0, path 4 2 3 5, community , extended community *Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcv UPDATE about 150.7.0.0/24 -- DENIED due to: AS-PATH contains our own AS; *Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcv UPDATE about 150.6.0.0/24 -- DENIED due to: AS-PATH contains our own AS; *Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 4.4.4.4, origin ?, metric 0, path 4 8 9 1 *Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd 150.5.0.0/24...duplicate ignored *Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd 150.4.0.0/24...duplicate ignored *Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 4.4.4.4, origin ?, metric 0, path 4 5 6 7 *Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcvd 150.3.0.0/24...duplicate ignored *Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcvd 150.2.0.0/24...duplicate ignored *Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcv UPDATE w/ attr: nexthop 4.4.4.4, origin ?, metric 0, originator 0.0.0.0, path 4 1 2 3, community , extended community *Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcv UPDATE about 150.1.0.0/24 -- DENIED due to: AS-PATH contains our own AS; *Mar 13 12:53:01.894: BGP(0): 4.4.4.4 rcv UPDATE about 150.0.0.0/24 -- DENIED due to: AS-PATH contains our own AS; *Mar 13 12:53:01.894: BGP(0): updgrp 1 - 4.4.4.4 updates replicated for neighbors: The debug ip bgp events command will provide real-time information on internal BGP events, such as the BGP Scanner walking the RIB. This command will also provide information about soft and hard peer session resets, for example. Following is a sample of the information that is provided by this command: R3#debug ip bgp events BGP events debugging is on R3# *Mar 13 13:25:50.218: BGP: *Mar 13 13:25:50.218: BGP: *Mar 13 13:25:56.846: BGP: 1/1 *Mar 13 13:26:05.218: BGP: Regular scanner event timer Import timer expired. Walking from 1 to 1 4.4.4.4 start outbound soft reconfig for afi/safi: Regular scanner event timer 302 C H A P T E R 7: T RO U B L ES H O OT I N G B G P *Mar 13 13:26:05.218: *Mar 13 13:26:20.218: *Mar 13 13:26:20.218: *Mar 13 13:26:26.978: afi: 0 *Mar 13 13:26:35.218: *Mar 13 13:26:35.218: *Mar 13 13:26:35.218: *Mar 13 13:26:35.218: general scan *Mar 13 13:26:35.218: version: 1294 *Mar 13 13:26:35.218: *Mar 13 13:26:35.218: general scan *Mar 13 13:26:35.218: version: 1295 *Mar 13 13:26:35.218: *Mar 13 13:26:35.218: general scan BGP: BGP: BGP: BGP: Import timer expired. Regular scanner event Import timer expired. 4.4.4.4 refresh timer Walking from 1 to 1 timer Walking from 1 to 1 expired, no pending refresh for BGP: Regular scanner event timer BGP: Performing BGP general scanning BGP(0): scanning IPv4 Unicast routing tables BGP(IPv4 Unicast): Performing BGP Nexthop scanning for BGP(0): Future scanner version: 1295, current scanner BGP(1): scanning IPv6 Unicast routing tables BGP(IPv6 Unicast): Performing BGP Next hop scanning for BGP(1): Future scanner version: 1296, current scanner BGP(2): scanning VPNv4 Unicast routing tables BGP(VPNv4 Unicast): Performing BGP Next hop scanning for Finally, as was stated at the beginning of this section, the debug ip bgp updates command provides the same information as the debug ip bgp [address] updates command but for all configured BGP peers instead of specific ones. For granularity, the command supports additional keywords that can be used to filter the output. These keywords are illustrated below: R3#debug ip bgp updates ? <1-199> Access list <1300-2699> Access list (expanded range) events Update events in Inbound updates out Outbound updates <cr> ACLs can be used in conjunction with this command to restrict the output to the prefixes that are included in the ACLs. For example, to restrict the debug output to activities pertaining to the 150.2.0.0 and 150.3.0.0 prefixes, you would perform the following sequence of steps on the router on which you wanted to see the debugging output: R3#conf t Enter configuration commands, one per line. End with CNTL/Z. R3(config)#access-list 1 permit host 150.2.0.0 R3(config)#access-list 1 permit host 150.3.0.0 R3(config)#exit R3# R3#debug ip bgp updates 1 BGP updates debugging is on for access list 1 for address family: IPv4 Unicast 303 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R3# R3# R3# *Mar 13 13:37:33.154: BGP(0): 4.4.4.4 send UPDATE (format) 150.2.0.0/24, next 3.3.3.3, metric 0, path 4 5 6 7 *Mar 13 13:37:33.158: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0) 150.3.0.0/24, next 3.3.3.3, metric 0, path 4 5 6 7 *Mar 13 13:37:33.182: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: next hop 4.4.4.4, origin ?, metric 0, path 4 5 6 7 *Mar 13 13:37:33.182: BGP(0): 4.4.4.4 rcvd 150.3.0.0/24...duplicate ignored *Mar 13 13:37:33.186: BGP(0): 4.4.4.4 rcvd 150.2.0.0/24...duplicate ignored *Mar 13 13:37:33.186: BGP(0): updgrp 1 - 4.4.4.4 updates replicated for neighbors: R3# R3# R3#show ip access-lists 1 Standard IP access list 1 10 permit 150.2.0.0 (6 matches) 20 permit 150.3.0.0 (6 matches) NOTE: The same could also be performed using an extended ACL. Such an ACL, for example, might be configured as follows: access-list 100 permit ip host 150.2.0.0 host 255.255.255.0 access-list 100 permit ip host 150.3.0.0 host 255.255.255.0 Following this, you would then use the debug ip bgp updates 100 command to restrict or filter the debug output to events pertaining to these two networks only. The events keyword provides real-time information on update events, which includes received updates or updates sent by the local BGP speaker. The following is a sample of the information that is printed by this command: R3#debug ip bgp updates events BGP updates debugging is on events for address family: IPv4 Unicast R3# *Mar 13 13:44:21.154: BGP(0): Begin update run for versions 1->148 for 1 update groups for attrs 0->2 *Mar 13 13:44:21.154: BGP(0): pre format update-group 1 leader is 4.4.4.4 with 0/1000 msgs, versions 0/148 *Mar 13 13:44:21.154: BGP(0): post format update-group 1 leader is 4.4.4.4 with 3/1000 msgs, versions 148/148, formatting was not aborted *Mar 13 13:44:21.154: BGP(0): End update run for versions 1->148 (0ms), TX has not completed, 3 updates formatted, formatting was not aborted, 3 attrs - 6 nets visited *Mar 13 13:45:30.142: BGP(0): Begin update run for versions 1->148 for 1 update groups for attrs 0->2 304 C H A P T E R 7: T RO U B L ES H O OT I N G B G P *Mar 13 13:45:30.146: BGP(0): pre format update-group 1 leader is 10.0.0.2 with 0/1000 msgs, versions 0/148 *Mar 13 13:45:30.146: BGP(0): post format update-group 1 leader is 10.0.0.2 with 3/1000 msgs, versions 148/148, formatting was not aborted *Mar 13 13:45:30.146: BGP(0): End update run for versions 1->148 (4ms), TX has not completed, 3 updates formatted, formatting was not aborted, 3 attrs - 6 nets visited Finally, the in and out keywords can be used to restrict or filter the debug output to inbound or outbound updates. The default is to print information on both. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Border Gateway Protocol Overview • Border Gateway Protocol is a Path Vector protocol • BGP is primarily used to exchange NLRI between routing domains or autonomous systems • The following four BGP processes run when BGP is enabled in Cisco IOS-based devices: 1. The BGP Open process 2. The BGP I/O process 3. The BGP Scanner process 4. The BGP Router process • The BGP Open process is used for peer establishment • The BGP I/O process handles the reading, writing, and execution of BGP messages • The BGP Scanner process periodically scans the BGP RIB • The BGP Router process sends/receives routes, establishes peers, and interacts with the RIB • The three major components of the BGP Router process are as follows: 1. The BGP Routing Information Base (RIB) 2. The IP RIB for BGP-learned Prefixes 3. The IP Switching Component for BGP-learned Prefixes • In Cisco IOS software, BGP information is stored in one of two data structures or tables • These two data structures or tables are the Neighbor Table and the BGP Table • The Neighbor Table contains a list of all the configured neighbors of the local BGP speaker • The BGP Table or BGP Routing Information Base (RIB) contains all BGP routes • All Border Gateway Protocol messages all share a common header which is 19-bytes long • Only four BGP messages are available, which are as follows: 1. The OPEN Message 305 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 2. The UPDATE Message 3. The NOTIFICATION Message 4. The KEEPALIVE Message • The different states BGP will go through before a neighbor relationship is established are as follows: 1. The Idle State 2. The Connect State 3. The Active State 4. The OpenSent State 5. The OpenConfirm State 6. The Established State • Border Gateway Protocol path attributes fall into the following four separate categories: 1. Well-known mandatory 2. Well-known discretionary 3. Optional transitive 4. Optional non-transitive • The two BGP attributes that are used to influence the inbound path are as follows: 1. The MULTI_EXIT_DISC attribute 2. The AS_PATH attribute • The two BGP attributes that are used to influence the outbound path are as follows: 1. The LOCAL_PREF Attribute 2. The WEIGHT Attribute Troubleshoo ng Neighbor Rela onships • The most common reason for failing to establish peer relationships are misconfigurations • Common misconfigurations include the following: 1. Using the Incorrect Autonomous System Number for the Peer 2. Using the Incorrect IP Address for the Peer 3. Incorrect Authentication Parameters 4. No IP Connectivity between Indirectly Connected Peers • In addition to misconfigurations, the following can prevent adjacencies from establishing: 1. Access Control Lists Filtering BGP Packets 2. Layer 1 and Layer 2 Issues 3. Device Resource Consumption 306 C H A P T E R 7: T RO U B L ES H O OT I N G B G P Troubleshoo ng Route Adver sement • Route advertisement issues with BGP are commonly due to device misconfigurations • The network command must match the exact route BGP is to advertise • The aggregate-address command requires a specific route for aggregate advertisement • By default, the aggregate-address command does not suppress specific prefix • The BGP Scanner process checks the BGP Table for updates every 60 seconds • Following policy modification, clear the BGP session for changes to apply immediately Troubleshoo ng Route Redistribu on Issues • By default, BGP will use the IGP metric when routes are redistributed into BGP • When redistributing OSPF into BGP, only internal and Type 1 externals are redistributed • By default, only external BGP routes are redistributed into IGPs • The bgp redistribute-internal command is required to redistributed iBGP routes 307 CHAPTER 8 Troubleshoo ng Cisco IOS Security Features C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I C isco IOS Catalyst switches and routers support several security features that are designed to protect not only the switches and routers themselves but also users connected to those switch- es. In addition to demonstrating a solid understanding of these features, it is sometimes also necessary to troubleshoot security-feature-related network problems. The TSHOOT certification exam objectives that are covered in this chapter are as follows: • Troubleshoot private VLANS • Troubleshoot port security • Troubleshoot general switch security • Troubleshoot VACL and PACL • Troubleshoot configuration issues related to accessing the AAA server for authentication • Troubleshoot Layer 3 security • Troubleshoot issues related to ACLs used to secure access to Cisco routers • Troubleshoot security issues related to IOS services (i.e., finger, NTP, HTTP, FTP, RCP) From a conceptual perspective, switches and routers have a communications architecture that is comprised of three different planes, all of which are vulnerable to security attacks. It is therefore important to understand the functionality of each of these planes, as well as the tools that are available in Cisco IOS software that can be used to secure each individually. In addition, it is also important to understand common problems associated with these technologies and the ways in which to troubleshoot and resolve them. This chapter is divided into the following sections: • Cisco IOS Security Fundamentals • Management Plane Security and Troubleshooting • Control Plane Security and Troubleshooting • Forwarding Plane Security and Troubleshooting • Cisco IOS Firewall Fundamentals NOTE: While this is not a security exam, as a network engineer you are expected to have some basic understanding of general security principles, configuration, and troubleshooting. While this guide will not be going into detail on all Cisco IOS software security features, you can find additional information in the current CCNA Security study guide that is available online. 310 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES CISCO IOS SECURITY FUNDAMENTALS As was stated in the introduction, the communications architecture of all switches and routers is segmented into three different planes, which are vulnerable to security attacks. Understanding not only how to secure these planes but also how to troubleshoot and resolve potential problems based on implemented solutions is a core requirement of any network engineer. The communications architecture planes of network devices are as follows: • The management plane • The control plane • The forwarding plane The Management Plane The management plane is responsible for management functions. The management plane is used to manage a device through its connection to the network. This plane also coordinates functions among all other planes (i.e., the control and the forwarding planes). Management protocols such as SNMP, Telnet, HTTP, HTTPS, and SSH are used for device monitoring and CLI access at the management plane. In addition to management protocols, console access (i.e., via the console port) is also used to manage devices. Some security considerations for the management plane include the following: • Secure access to the device console: Use logins and passwords to ensure that the console is secured and that no unauthorized parties are able to gain access to the device. Consider using security authentication protocols such as RADIUS and TACACS+ to centralize the console authentication process. In addition to passwords and security protocols, physical security should also be taken into consideration, ensuring that only authorized personnel can gain physical access to the device(s). • Avoid using management protocols, such as Telnet, that send the username/password information in clear text. Instead, consider using SSH to access and manage network devices remotely. In addition, consider using IP access control lists (ACLs) to restrict the range of addresses or networks that can gain remote access to the device(s). • Consider implementing only the management protocols that are required. For example, if HTTP will not be used, disable this service and enable HTTPS access only. When implementing monitoring (e.g., via SNMP), consider using SNMPv1 and v2c for read-only access to devices, while using SNMPv3, which offers greater security than versions 1 and 2c, for read-write access to the device(s). 311 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Disable the password-recovery service using the no service password-recovery global configuration command. This prevents anyone with console access from insecurely accessing the device configuration and clearing the password. It also prevents malicious users from changing the configuration register value and accessing NVRAM. • Disable any unused services that can be used to launch Denial of Service (DoS) attacks. These services include Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) small services, which include Echo (port number 7), Discard (port number 9), Daytime (port number 13), and Chargen (port number 19). By default, these services are disabled in Cisco IOS 12.0 and later. Another service that should be disabled is the finger service. This service provides information on who is logged into the system and provides extensive user information, which is extremely valuable for hacking. By default, finger is disabled in Cisco IOS 12.1 and later. Additional services that should be disabled if not used include HTTP and HTTPS, CDP, and the configuration service that allows a Cisco IOS device to attempt to locate a configuration file on the network using TFTP. This is disabled using the no service config global configuration command. • Also consider reducing the EXEC timeout, which specifies the interval that the EXEC command interpreter waits for user input before it terminates a session. By default, sessions are disconnected after 10 minutes of inactivity; however, this can be modified using the exec-timeout line configuration command. • Finally, enable logging, preferably to a central location. Logging provides you visibility into the operation of a device and the network into which it is deployed. The Control Plane A control plane is a collection of processes that run at the process level on a route processor and collectively provide high-level control for most Cisco IOS software functions. All traffic directly or indirectly destined to a router or switch is handled by the control plane. Control plane protocols include routing protocols, such as EIGRP and OSPF, as well as Layer 2 protocols, such as Spanning Tree Protocol (STP). Essentially, most control plane protocols, (e.g., EIGRP, OSPF, and HSRP) have their own in-built security and authentication schemes. For example, all three protocols mentioned support MD5 hashing as a means to protect protocol messages. For protocols such as STP, consider integrating Cisco IOS enhancements, such as Root Guard and BPDU Guard. 312 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES The BPDU Guard feature is used to protect the Spanning Tree domain from external influence by preventing false information from being injected into the Spanning Tree domain on ports that have Spanning Tree disabled. BPDU Guard is disabled by default but is recommended for all ports on which the Port Fast feature has been enabled. When a port that is configured with the BPDU Guard feature receives a BPDU, it immediately transitions to the errdisable state. On the other hand, the Root Guard feature prevents a designated port from becoming a root port. If a port on which the Root Guard feature receives a superior BPDU, it moves the port into a rootinconsistent state, thus maintaining the current root bridge status quo. Both BPDU Guard and Root Guard are described in greater detail in the SWITCH study guide available online. At the control plane, additional features such as Dynamic ARP Inspection and DHCP Snooping can be used to protect against vulnerabilities in protocols such as Address Resolution Protocol (ARP) and Dynamic Host Configuration Protocol (DHCP), respectively. Dynamic ARP Inspection (DAI) is used to protect against ARP spoofing attacks, while DHCP Snooping is used to protect against DHCP spoofing and starvation attacks. DHCP starvation attacks work by using MAC address spoofing and entail flooding a large number of DHCP requests with randomly generated spoofed MAC addresses to the target DHCP server, thereby exhausting the address space available for a period of time. This prevents legitimate DHCP clients from being serviced by the DHCP server. Cisco IOS software also supports Control Plane Policing (CoPP) and Control Plane Protocol (CPP), which allow administrators to secure the control plane further. Control Plane Policing allows administrator to configure a Quality of Service (QoS) filter that manages the traffic flow of control plane packets to protect the control plane of Cisco IOS routers and switches against reconnaissance and Denial of Service (DoS) attacks. Implementing this feature allows the control plane to maintain packet forwarding and protocol states despite an attack or heavy traffic load on the router or the switch. Control Plane Protection (CPP) extends on CoPP by providing additional granularity. CPP allows for the classification of the control plane traffic based on packet destination and information provided by the forwarding plane, allowing appropriate throttling for each category of packet. Unlike CoPP, CPP is dependent on CEF for IP packet redirection. The configuration of CoPP and CPP is beyond the scope of the current TSHOOT certification exam. Some additional security considerations for the control plane include the following: • Disable Internet Control Message Protocol (ICMP) redirects using the no ip redirects interface configuration command. There are two types of ICMP redirect messages: redirect for a host address and redirect for an entire subnet. A malicious user can exploit the ability of the router to send ICMP redirects by continually sending packets to the router, 313 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I forcing the router to respond with ICMP redirect messages, resulting in an adverse impact on the CPU and performance of the router. • Unless absolutely required, disable ICMP unreachables. ICMP destination unreachable messages are generated by a router to inform the source host that the destination Unicast address is unreachable. While typically a good thing, generating many of these messages can increase the CPU utilization on a device and facilitate DoS attacks. ICMP unreachables are disabled using the no ip unreachables interface configuration command in Cisco IOS software. • Another control plane service that should be disabled unless absolutely needed is Proxy ARP. Proxy ARP allows the router to answer ARP requests intended for another machine. In most networks, this is a good thing because it negates the need for hosts to have a default gateway or routing intelligence. However, from a security perspective, Proxy ARP can allow attackers to spoof or pretend to be another machine, facilitating Man-in-theMiddle (MITM) attacks. Additionally, using proxy ARP can result in an increase in the amount of ARP traffic on the network segment and resource exhaustion. This feature can be disabled using the no ip proxy-arp interface command. • Secure routing protocols and First Hop Redundancy Protocols (FHRPs) using Message Digest 5 (MD5) authentication in the domain. This prevents the injection of false routing information into the domain. Also consider additional protocol functions, such as limiting the size of the Link State Database (LSDB) when using OSPF, for example, to protect the routing protocol implemented in the network. The Forwarding Plane The forwarding or data plane is responsible for the actual forwarding of data. The data plane is typically populated using information derived from the control plane. This plane is used to determine the physical next-hop egress interface for received packets or frames and then forwards the packets or frames using the correct egress interface. The forwarding or data plane can be secured by implementing ACLs, which can take the form of Routed ACLs (RACLs), Port ACLs (PACLs), or VLAN ACLs (VACLs) in Cisco IOS Catalyst switches. Additional security considerations for the forwarding or data plane include the following: • Depending on your network, consider dropping packets that have IP options if there is no legitimate reason for such packets on the network. As stated earlier in this guide, packets with IP options are punted to the CPU and are processed in software. A large number of these packets can greatly increases CPU utilization on the device. The device can be configured to drop such packets using the ip options drop global configuration command. 314 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES • Disable source routing, which allows the source of the IP packet to specify the network path a packet takes. This functionality can be used in attempts to route traffic around security controls in the network. IP source routing is enabled by default; however, this can be disabled using the no ip source-route global configuration command. • Disable IP-directed Broadcasts, which makes it possible to send an IP Broadcast packet to a remote IP subnet. This functionality has been used to facilitate smurf attacks. A smurf attack is also commonly referred to as a ping flood or ICMP flood. This attack sends large amounts of ICMP packets to a machine in order to attempt to crash the TCP/IP stack on the machine and cause it to stop responding to TCP/IP requests. By default, directed Broadcasts are disabled, but if enabled use the no ip directed-broadcast interface configuration command to disable this feature. • Implement anti-spoofing techniques such as Unicast Reverse Path Forwarding (uRPF) and IP Source Guard. Unicast RPF enables a device to verify that the source address of a forwarded packet can be reached through the interface that received the packet. This feature is enabled using the ip verify unicast source reachable-via interface configuration command. IP Source Guard, commonly used with DHCP Snooping, restricts IP traffic on untrusted Layer 2 ports by filtering the traffic based on the DHCP snooping binding database or manually configured IP source bindings. The IP Source Guard feature is enabled by issuing the ip verify source interface configuration command on Layer 2 interfaces. MANAGEMENT PLANE SECURITY AND TROUBLESHOOTING When implementing Cisco IOS security solutions, you should secure both the management plane and the control plane of a device because operations of the control plane directly affect operations of the management plane. The following protocols are used at the management plane: • • • • • • • • • • • Simple Network Management Protocol Telnet Secure Sockets Shell Protocol File Transfer Protocol Trivial File Transfer Protocol Secure Copy Protocol RADIUS TACACS+ NetFlow Network Time Protocol Syslog 315 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I When implementing security, it is important to understand that once the management plane is breached using any of the management protocols described above, the control and data planes are also compromised. While delving into detail on all possible management plane protocol troubleshooting scenarios is neither feasible nor within the scope of the TSHOOT certification exam, the following section describes some common problems and ways to resolve them. Telnet and Secure Sockets Shell Protocol (SSH) are two commonly used management protocols. SSH provides a more secure and reliable method for device access and administration than Telnet. SSH secures the sessions using standard cryptographic mechanisms. SSH uses TCP and UDP port 22, although TCP port 22 is the de-facto port listed for SSH. Unlike Telnet, SSH ensures that data is encrypted and is therefore untraceable by network sniffers, for example. There are two versions of SSH available: SSH version 1 and SSH version 2. While SSHv1 is an improvement over Telnet, which sends the username and password information in clear text, some fundamental design flaws exist in SSHv1. For example, there are several tools readily available on the Internet that can decrypt SSHv1 traffic on the fly, thus removing most security from encrypting the traffic with SSHv1. Therefore, when implementing SSH, it is highly recommended that SSHv2 be used. A commonly encountered error message when attempting to access a device remotely via SSH or Telnet is the ‘Password required, but none set’ connection error. This error indicates that the VTY lines have not been configured with the password <secret> and login configuration commands. While it is common practice simply to use the line vty 0 4 command to configure the Virtual Teletype Terminal (VTY) password, keep in mind that certain devices support more than five VTY lines. Use the show line command to verify the number of lines that are supported by the device and then ensure that the password is configured for the correct range of supported lines. The following shows the output of this command on a switch that supports 16 VTY lines: Switch>show line Tty Typ Tx/Rx * 0 CTY 1 VTY 2 VTY 3 VTY 4 VTY 5 VTY 6 VTY 7 VTY 8 VTY 9 VTY 10 VTY 11 VTY A Modem - Roty AccO AccI - 316 Uses 0 0 0 0 0 0 0 0 0 0 0 0 Noise 0 0 0 0 0 0 0 0 0 0 0 0 Overruns 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 Int - C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES 12 13 14 15 16 VTY VTY VTY VTY VTY - - - - - 0 0 0 0 0 0 0 0 0 0 0/0 0/0 0/0 0/0 0/0 - If you were to use the line vty 0 4 command to configure login and password security for this switch, remote access would be available only via the first five lines. Both Telnet and SSH are allowed by default in Current Cisco IOS software versions as can be validated in the output of the show line command in the following output: Router#show line vty 0 Tty Typ Tx/Rx A Modem 66 VTY - Roty AccO AccI - Uses 2 Noise 0 Overruns 0/0 Line 66, Location: “”, Type: “” Length: 24 lines, Width: 80 columns Baud rate (TX/RX) is 9600/9600 Status: Ready, No Exit Banner Capabilities: none Modem state: Ready Group codes: 0 Special Chars: Escape Hold Stop Start Disconnect Activation ^^x none none Timeouts: Idle EXEC Idle Session Modem Answer Session 00:10:00 never none Idle Session Disconnect Warning never Login-sequence User Response 00:00:30 Autoselect Initial Wait not set Modem type is unknown. Session limit is not set. Time since activation: never Editing is enabled. History is enabled, history size is 20. DNS resolution in show commands is enabled Full user help is disabled Allowed input transports are lat pad telnet rlogin mop v120 ssh. Allowed output transports are lat pad telnet rlogin mop v120 ssh. Preferred transport is lat. No output characters are padded No special data dispatching characters Int - Dispatch not set However, while SSH is allowed, unlike Telnet, additional configuration is required to allow a device to be managed via SSH. This additional configuration includes the following: 317 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 1. Configure a domain name on the router. This is performed via the ip domain-name [name] global configuration command. 2. Generate the security keys that will be used by SSH using the crypto key generate rsa global configuration command and specifying the desired key size; or, alternatively, with the crypto key generate rsa general-keys global configuration command and specifying the desired key size. Both of these commands automatically enable SSH when executed, and no further configuration is necessary. The key (modulus) size that is used for SSH can be up to 2048 bits in length. The larger the key size, the more secure the implementation; however, larger keys also take a longer time to generate. When generating a public key, Cisco recommends a minimum key size of 1024 bits. 3. Specify the time that the router waits on the SSH client, in seconds, to input username and password information before disconnecting the session via the ip ssh timeout global configuration command. This is an optional step because, by default, the router will wait for 120 seconds (2 minutes). 4. Specify the number of SSH authentication retries before a session is reset via the ip ssh authentication-retries global configuration command. By default, the Cisco IOS router will allow up to three failed logins before resetting the SSH connection. As is the case with the ip ssh timeout global configuration command, this is an optional step. If these configuration steps are not implemented, you will not be able to manage the device remotely using SSH. Telnet access, however, requires no additional configuration other than specifying and configuring relevant passwords (e.g., VTY line passwords). Simple Network Management Protocol (SNMP), which was described earlier in this guide, is a commonly used management protocol. SNMPv1 and v2c use a community-based form of authentication, while SNMPv3 uses a user and group security model. When troubleshooting SNMP access issues, verify that the correct configuration has been implemented. This entails checking the configured community strings (SNMPv1 and SNMPv2c) or user and group configurations (SNMPv3) using commands such as the show snmp community command as illustrated in the following output: R1#show snmp community Community name: tsh00t! Community Index: cisco5 Community SecurityName: tsh00t! storage-type: nonvolatile active access-list: 1 318 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES Community name: ccnp! Community Index: cisco6 Community SecurityName: ccnp! storage-type: nonvolatile active access-list: 2 Additionally, if ACLs have been implemented to restrict SNMP access to the device, as is illustrated above, verify that the IP address of the NMS is included in the valid network range or list of hosts allowed to manage the device by using the show ip access-lists command. File Transfer Protocol (FTP), Trivial File Transfer Protocol (TFTP), and Secure Copy Protocol (SCP) are commonly used methods for copying router and switch configuration or image files. While TFTP requires no additional configuration in Cisco IOS software, FTP and SCP both do. When using FTP, a username and password pair is required for access. This is configured using the ip ftp username <name> and ip ftp password <secret> global configuration commands. In addition to configuring a username and password pair, it is also important to ensure that the specified account has the correct privileges on the server. SCP provides even greater security than the two previously discussed methods by using SSH for secure transport. This means that SSH must be enabled on the device before SCP can be used. In addition, Cisco IOS software also requires that authentication, authorization, and accounting (AAA) be configured on the router. Given that understanding and troubleshooting AAA are core exam requirements, this service is described in additional detail in the following section. Understanding Authen ca on, Authoriza on, and Accoun ng Authentication, authorization, and accounting, referred to as AAA and pronounced as ‘Triple-A,’ provides the framework that controls and monitors network access. Authentication is used to validate identity (i.e., who the user is); authorization is used to determine what that particular user can do (i.e., the services available to the user); and, finally, accounting is used to allow for an audit trail (i.e., what that user did during the period he/she was logged in. AAA services can be used to control administrative device access, such as Telnet and console login, which is referred to as character mode access. In addition, AAA can also be used to manage network access (e.g., via dial-up or Virtual Private Network (VPN) clients), which is referred to as packet mode access. AAA relies on Attribute-Value (AV) pairs, which are simply secured network objects. It is comprised of an attribute, such as the username or password, and a value for that particular attribute. Another example of an AV pair would be an attribute, such as a command, with a value of ‘configure terminal.’ 319 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The AAA services can be administered by using local username and password databases that are stored on the network devices or a centralized security server, with the latter being the most common. On the local device, the username and password pairs are configured using the username <name> secret <password> global configuration command. The configuration of the AAA server is a little more complex, but is beyond the scope of the TSHOOT certification exam and will not be illustrated in this chapter or in the remainder of this guide. AAA allows devices to point to multiple security servers, which are often referred to as server groups. User, device, and services information can be replicated between multiple servers, which provide redundancy in large networks. In order for AAA to work, the Network Access Server (NAS), which is any device – such as a router, a switch, or a firewall, must be able to access security information for a specific user to provide AAA services. To reinforce the concept of AV pairs, the following Figure 8-1 is used to illustrate their use in AAA services when the security information is stored locally on the NAS: Fig. 8-1. Understanding AAA Local Authentication Operation Referencing Figure 8-1, in step 1, the remote user attempts to connect to R1 (NAS) via Telnet. Assuming that the NAS has been configured for AAA services, using its local database for authentication, the NAS presents the remote user with the username and password prompt, as illustrated in step 2. 320 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES The remote user then enters his/her credentials, providing the username administrator, which is the ATTRIBUTE, and password t5h00t!2010, which is the VALUE for that ATTRIBUTE, as illustrated in step 3. The NAS then checks the information against its local database. Assuming that the NAS has been configured with the username administrator secret t5h00t!2010 global configuration command, each AV is on file and the AV pair is found. The request is accepted and a pass message is returned, as illustrated in step 4, which enables the connection from the remote user to be made. Taking this example a step further, this time depicting the use of an external AAA server, Figure 8-2 below illustrates the use of AV pairs for authorization, this time using an external server: Fig. 8-2. Understanding AAA Remote Authorization Operation In Figure 8-2, assume that the remote user has been successfully authenticated. Once logged into R1 (NAS), the remote user attempts to issue the configure terminal command, as illustrated in step 1. The NAS has been configured to use AAA services for authorization, and so the request is sent to the AAA server, as illustrated in step 2. The AAA server checks its database for the relevant AV pair. 321 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In step 3, the server finds that the attribute and value are on file, and the AV pair is found. The request is therefore accepted and the configure terminal command is successfully authorized on R1, as illustrated in step 4. The remote user successfully enters configuration mode. Again, the same concept would be applicable if authorization were being performed using the local database. NOTE: Of the three AAA services, only accounting does not require AV pairs. Instead, information is simply received with AV pairs and stored in the database. When implementing AAA services, two main security protocols are used: RADIUS and TACACS+. Both are described in the following sections. RADIUS Remote Authentication Dial-In User Service (RADIUS) is a client/server protocol that is used to secure networks against intruders. RADIUS was created by Livingston Enterprises, but is now defined in RFC 2138 and RFC 2139. The RADUIS protocol authentication and accounting services are documented separately in RFC 2865 and RFC 2866, respectively. These two RFCs replace RFC 2138 and RFC 2139. A RADIUS server is a device that has the RADIUS daemon or application installed. RADIUS is an open-standard protocol that is distributed in C source code format. This allows for interoperability and flexibility between RADIUS-based products from different vendors. RADIUS uses UDP as the Transport Layer protocol for communications between the client and the server, using UDP port 1812 for authentication and authorization, and UDP port 1813 for accounting. However, it should be noted that earlier deployments of RADIUS use UDP port 1645 for authentication and authorization, and UDP port 1646 for accounting. Because RADIUS uses UDP as a transport protocol, there is no offer of guaranteed delivery of RADIUS packets. Therefore, any issues related to server availability, the retransmission of packets, and timeouts, for example, are handled by the RADIUSenabled devices. The RADIUS accounting function is designed as a way to transmit data at the beginning and at the end of a session. This data can indicate resource utilization, such as bandwidth and time used, and may be used for billing and/or security purposes. TACACS+ TACACS+ stands for Terminal Access Controller Access Control System Plus. Unlike RADIUS, which is an open-standard protocol, TACACS+ is a Cisco-proprietary protocol that is used in the AAA framework to provide centralized authentication of users who are attempting to gain access to network resources. 322 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES There are several notable differences between TACACS+ and RADIUS. One of the most notable differences is that TACACS+ uses TCP as a Transport Layer protocol, using TCP port 49. In addition, TACACS+ separates the three AAA architectures, unlike RADIUS, which groups authentication and authorization together and separates accounting. TACACS+ also encrypts the data between the user and the server, unlike RADIUS, which encrypts only the password. Finally, TACACS+ supports multiple protocols, such as IP, IPX, AppleTalk, and X.25, whereas RADIUS has limited protocol support. TACACS+ authentication is initiated when a user attempts an ASCII login by authenticating to a server running the TACACS+ daemon. The TACACS+ authorization process is performed using two distinct message types: REQUEST and RESPONSE. The authorization process is then performed using a session that consists of this pair of messages. TACACS+ REQUEST messages are sent by clients and they contain information pertaining to the authenticity of the user or service (authentication information), as well as a list of the services or options for which authorization is being requested. When the TACACS+ server receives the REQUEST message, it replies with a RESPONSE message. Finally, TACACS+ accounting occurs by sending a record to the AAA server. Each send record includes an AV pair that is used for accounting. NOTE: Going into detail on the types of messages is beyond the scope of the current TSHOOT certification exam and will not be included in this chapter or in the remainder of this guide. Implemen ng and Configuring AAA Services AAA services can be implemented in one of three ways as follows: 1. AAA can be implemented as a self-contained local security database that contains the usernames and passwords required for authentication. 2. AAA can be implemented as a Cisco Access Control Server (ACS) application server (this can be an external server). Cisco ACS can be installed onto both Windows and Unix-based platforms. This implementation is suitable for medium to large networks. Cisco ACS configuration is beyond the scope of the current TSHOOT certification exam. 3. Finally, AAA can also be implemented using the Cisco Secure ACS Solutions Engine appliance, which is a dedicated external platform offered by Cisco that scales very well and is suitable for very large networks. As with ACS configuration, Cisco Secure ACS Solutions Engine appliance is beyond the scope of the current TSHOOT certification exam. 323 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I AAA services are based on method lists. Method lists contain sequenced AAA entries and are configured to define which of the three AAA services will be performed and the sequence in which they will be performed. The method argument refers to the actual method the authentication algorithm tries. Therefore, when a user attempts to authenticate, the NAS contacts each of the entries in sequence to validate the user. Method lists allow control of one or more security protocols and security servers to be used to offer fault tolerance and backup of authentication databases. The AAA engine will use the first method listed in the method list, and if that is unavailable, it will fall back to the next method on the list. However, it is important that this works only if the message received from the first method listed is not a FAIL message of any kind. In other words, even though multiple methods may be listed, if a FAIL (i.e., deny) message is received from the first method tried, the authentication process stops and no further authentication methods are attempted in the list. In addition, it is also important to know that if all entries are processed without receiving a PASS message, access is denied. In AAA implementation, there are two basic types of method lists, named and default, as described below: • A named method list can be configured for any AAA service, such as authentication or authorization, for example. These methods are applied to specific interfaces or even terminal lines (e.g., console and VTY), as required by the administrator. • The default method list is configured globally and applied to all interfaces and VTY lines on the device if no other method list is defined. However, if a defined (named) method list is configured, it will take precedence over the default method list. In order to configure AAA services, the following general sequence of steps should be taken: 1. First, globally enable AAA services using the aaa new-model global configuration command. This is a mandatory requirement when configuring AAA. 2. Next, configure the security protocol parameters, such as the IP address and shared key of the TACACS+ and RADIUS server, via the aaa group server [radius|tacacs+] [group_ name] global configuration command for TACACS+ or RADIUS server groups. Alterna- tively, you can configure individual TACACS+ or RADIUS servers by using the tacacsserver host [address|hostname] key [shared_key] or the radius-server host [address|hostname] key [shared_key] global configuration commands, respectively. 3. Next, define the authentication service and method list via the aaa authentication global configuration command and then apply the specified authentication named method list to 324 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES device interfaces or VTY lines by using the login authentication interface or line configuration commands. 4. Next, define authorization method list(s) by using the aaa authorization global configuration command, and then apply the authorization method list(s) to device VTY lines via the authorization line configuration command. Authorization can also be enabled for WAN interfaces using protocols such as PPP by using the ppp authorization interface configuration command. 5. Finally, define the accounting service and method lists by using the aaa accounting global configuration command and apply the accounting method list(s) to VTY lines via the accounting line configuration command. Accounting can also be enabled for WAN interfaces using protocols such as PPP by using the ppp accounting interface configuration command. NOTE: Named (defined) method lists must be configured on the security server; they are simply applied (as configured on the server) to the NAS. You cannot configure method list parameters directly on the NAS. While delving into advanced AAA configuration is beyond the scope of the current TSHOOT certification exam, you are still expected to be familiar with basic AAA configuration. The sections that follow provide some basic and common AAA configuration examples, along with the explanations of those configurations. In the first example, authentication will be configured on the device for all logins. This will use the default method list. Authentication will be performed using a TACACS+ server with the IP address 10.1.1.254 and a secret or password of t5shoot!2010. The default method list will be used to authenticate remote access to the device via VTY lines. The configuration is implemented as follows: R1(config)#aaa new-model R1(config)#aaa authentication login default group tacacs+ R1(config)#tacacs-server host 10.1.1.254 key t5sh00t!2010 R1(config)#line vty 0 4 R1(config-line)#login authentication default Referencing the configuration example above, the aaa new-model configuration command enables the AAA service on the device. Without this command, you cannot enable AAA. The aaa authentication login default group tacacs+ configuration command specifies that the default method list be used for authentication using a TACACS+ server. It assumes that the default method list is configured and active on the TACACS+ server. 325 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The tacacs-server host 10.1.1.254 key t5sh00t!2010 configuration command specifies the IP address of the TACACS+ server to be used for authentication as well as the password that is used on the server. Finally, the login authentication default configuration command specifies that VTY authentication will be performed using the default method list, which is stored on the TACACS+ server with the IP address 10.1.1.254. Again, this also assumes that the AAA default method is configured and active on the TACACS+ server. In the following example, AAA is enabled for the console and VTY lines using method lists named CONSOLE_AUTH and VTY_AUTH, respectively. The IP address of the RADIUS server is 10.1.1.254. This server will use a secret of t5h00t!2010. This configuration is implemented on the device as follows: R1(config)#aaa new-model R1(config)#aaa authentication login CONSOLE_AUTH group radius R1(config)#aaa authentication dot1x VTY_AUTH group radius R1(config)#radius-server host 10.1.1.254 key t5h00t!2010 R1(config)#line con 0 R1(config-line)#login authentication CONSOLE_AUTH R1(config-line)#exit R1(config)#line vty 0 4 R1(config-line)#login authentication VTY_AUTH R1(config-line)#exit The third, and final, example illustrates how to configure login authentication using the local device database. This will be used to authenticate console and AUX port access against the locally configured username and password pair. This configuration is illustrated as follows: R1(config)#aaa new-model R1(config)#aaa authentication login default local R1(config)#username admin secret t5h00t!2010 R1(config)#line con 0 R1(config-line)#login authentication default R1(config-line)#exit R1(config)#line aux 0 R1(config-line)#login authentication default R1(config-line)#exit The configuration of both authorization and accounting follows the same logic as that used to configure authentication. In Cisco IOS software, authorization is configured using the aaa authorization global configuration command. This configuration can then be applied to any lines (e.g., VTY and console) using the authorization line configuration command. In order for authorization to work, authentication must be configured and the AAA client must have successfully authenticated. The options available for authorization in Cisco IOS software are as follows: 326 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES R1(config)#aaa authorization ? auth-proxy For Authentication Proxy Services cache For AAA cache configuration commands For exec (shell) commands config-commands For configuration mode commands configuration For downloading configurations from AAA server console For enabling console authorization exec For starting an exec (shell) multicast For downloading Multicast configurations from AAA server network For network services (PPP, SLIP, ARAP) reverse-access For reverse access connections template Enable template authorization The following configuration example illustrates how to configure AAA authorization on a device against a method list named PRIV_15_ONLY. This configuration is applied to the console and VTY ports on the device. R1(config)#aaa new-model R1(config)#aaa authorization commands 15 PRIV-15-ONLY R1(config)#line con 0 R1(config-line)#authorization commands 15 PRIV-15-ONLY R1(config-line)#exit R1(config)#line vty 0 4 R1(config-line)#authorization commands 15 PRIV-15-ONLY R1(config-line)#exit In the authorization configuration example above, the aaa authorization commands 15 PRIV15-ONLY configuration command configures authorization for level 15 configuration commands against a method list named PRIV-15-ONLY. This authorization method list is applied to the console and VTY ports using the authorization commands 15 PRIV-15-ONLY line configuration command. Finally, accounting is configured using the Cisco IOS software aaa accounting global configuration command. Accounting is enabled for VTY lines via the accounting line configuration command. As is the case with authentication and authorization, you can use the default method list or specify a named method list when configuring accounting. The configuration example that follows illustrates how to enable accounting to send start and stop records for EXEC sessions using a method list named EXEC-ACNTG: R1(config)#aaa accounting exec EXEC-ACNTG start-stop group radius R1(config)#radius-server host 192.168.1.254 auth-port 1812 acct-port 1813 R1(config)#line con 0 R1(config-line)#accounting exec EXEC-ACNTG R1(config-line)#exit R1(config)#line vty 0 4 R1(config-line)#accounting exec EXEC-ACNTG R1(config-line)#exit 327 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Troubleshoo ng AAA Services Unlike other Cisco IOS features, AAA troubleshooting relies primarily on device configuration verification using the show running-config command and debugging using the debug aaa suite of commands. Commonly experienced issues include incorrect username and password configuration on the local device or security server, missing username and password pairs on the local device or security server, and misconfigured authentication lists. For the most part, you can troubleshoot these issues by verifying the device AAA configuration. However, it is sometimes necessary to debug the AAA implementation in order to identify and isolate the root cause. An example of when you might need to debug is in the case that you do not have access to the AAA server to verify username and password configuration. Another example might be in the case that the password in the configuration is encrypted (i.e., the username <name> secret <password> command has been issued) and AAA has been configured to authenticate against the local database. The following displays the list of supported AAA debugging subcommands in Cisco IOS 12.4: R1#debug aaa ? accounting administrative api attr authentication authorization cache db dead-criteria id ipc mlist-ref-count per-user pod protocol server-ref-count sg-ref-count sg-server-selection subsys Accounting Administrative AAA api events AAA Attr manager Authentication Authorization Cache activities AAA DB manager AAA Dead-Criteria info AAA Unique Id AAA IPC Method list reference counts Per-user attributes AAA POD processing AAA protocol processing Server handle reference counts Server group handle reference counts Server group server selection AAA subsystem Following is a sample output of the debug aaa authentication command. The output shows that the default method list is being used for login authentication against the local database for a user named ‘administrator.’ The debug output also shows that authentication for enable access is also using the default method list. However, this is being performed by a TACACS+ server and, unlike the local authentication, this authentication fails. You can use this information to troubleshoot issues with the TACACS+ server further: 328 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES R1#debug aaa authentication AAA Authentication debugging is on *Mar 23 00:25:03.738: AAA/BIND(0000000A): Bind i/f *Mar 23 00:25:03.738: AAA/AUTHEN/LOGIN (0000000A): Pick method list ‘default’ *Mar 23 00:25:05.974: AAA: parse name=tty66 idb type=-1 tty=-1 *Mar 23 00:25:05.978: AAA: name=tty66 flags=0x11 type=5 shelf=0 slot=0 adapter=0 port=66 channel=0 *Mar 23 00:25:05.978: AAA/MEMORY: create_user (0x84BD69A8) user=’administrator’ ruser=’NULL’ ds0=0 port=’tty66’ rem_addr=’10.0.0.2’ authen_type=ASCII service=ENABLE priv=15 initial_task_id=’0’, vrf= (id=0) *Mar 23 00:25:05.978: AAA/AUTHEN/START (3856188155): port=’tty66’ list=’’ action=LOGIN service=ENABLE *Mar 23 00:25:05.978: AAA/AUTHEN/START (3856188155): using “default” list *Mar 23 00:25:05.978: AAA/AUTHEN/START (3856188155): Method=tacacs+ (tacacs+) *Mar 23 00:25:05.978: TAC+: send AUTHEN/START packet ver=192 id=-438779141 *Mar 23 00:25:10.982: AAA/AUTHEN(3856188155): Status=ERROR *Mar 23 00:25:10.982: AAA/AUTHEN/START (3856188155): no methods left to try *Mar 23 00:25:10.982: AAA/AUTHEN(3856188155): Status=ERROR CONTROL PLANE SECURITY AND TROUBLESHOOTING Troubleshooting at the control plane is dependent on the specific protocol or technology that is being problematic. While generic commands to be used to troubleshoot control plane issues, such as using the debug ip routing command to troubleshoot routing protocol issues, for the most part, protocol-specific commands should be used to troubleshoot control plane problems. Because the troubleshooting of routing protocols, such as EIGRP, OSPF, and BGP, has already been described earlier in this guide, for brevity and to avoid being repetitive, routing protocol troubleshooting will not be included in this chapter. The same is also applicable for Spanning Tree Protocol troubleshooting, which was also described in detail earlier in this guide. Troubleshooting for additional control plane features, such as Dynamic ARP Inspection and DHCP Snooping, primarily centers on configuration. If the features are not implemented correctly, they will not work as expected. Therefore, from a troubleshooting perspective, you should ensure that you understand how these features work. You can then determine whether the feature is not working correctly because of a misconfiguration or other issue, such as software or hardware errors or bugs, for example. 329 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I FORWARDING PLANE SECURITY AND TROUBLESHOOTING While the management and control planes are concerned with data that is destined for the local router or switch, the forwarding or data plane is concerned with data that is traversing the router or switch. Securing the data plane entails securing the actual flows through the router or switch. In Cisco IOS software, a plethora of tools can be used to secure the data plane. Troubleshooting data plane issues depends on the technology that has been implemented to secure the data plane. As always, a solid understanding of these technologies makes the troubleshooting process that much simpler. The following sections will describe common data plane security mechanisms, which include the following: • Router and switch Access Control Lists • Catalyst switch port security • Private VLANs • IEEE 802.1x Port-based authentication • Trunking • The Cisco IOS Firewall Router and Switch Access Control Lists The forwarding or data plane can be secured by implementing ACLs, which can take the form of Router ACLs (RACLs), Port ACLs (PACLs), or VLAN ACLs (VACLs) in Cisco IOS Catalyst switches. Many types of Router ACLs can be configured in Cisco IOS software. These include named and number standard and extended ACLs. RACLs are commonly applied on interfaces in either the inbound or the outbound direction using the ip access-group <name|number> interface configuration command. These ACLs are used to filter packets transiting the interfaces on which they are applied. When a packet enters a router or a switch, the destination address of the packet is checked against the entries in the routing table to identify the egress interface. The packet is also checked against any configured RACLs assigned to the interface, and will be either permitted or denied accordingly. When an inbound RACL is applied to an interface, the router or switch checks the received packets against the statements in the RACL looking for a match. If a match is found, and the RACL action is to permit, then the device continues to process the packet. However, if a match is found, and the action is to deny, then the device discards the packet and typically, unless otherwise configured, sends an ICMP Destination Unreachable message back to the source. When an outbound RACL is applied on an interface, the device first performs a route lookup for the destination address in the routing table to determine the egress interface via which the packet 330 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES should be forwarded. If a valid path is found in the routing table, and a match is found for the RACL, and the action of the RACL is to permit, then the device continues to process the packet. But, if the RACL action is to deny the packet, then the packet is discarded by the device, which then sends an ICMP Destination Unreachable message back to the source host(s). However, if a match is not found, the implied ‘deny all’ statement at the end of the RACL is applied and the router discards the packet, sending the source an ICMP Destination Unreachable message. Finally, if a valid path to the intended destination is not found in the routing table, then the device simply discards the packet. When troubleshooting data forwarding issues on routed interfaces, check configured or applied RACLs using the show ip access-lists command. If the RACL is long, you can enable logging by appending the log or log-input keywords. The log keyword allows you to see matches against the IP ACL, while the log-input keyword provides additional information, such as the Layer 2 address of the hosts that match the RACL entry. Port ACLs (PACLs) are similar to Router ACLs but are supported and configured on Layer 2 interfaces on a switch. Port ACLs are supported on physical interfaces as well as on EtherChannel interfaces. PACLs are not supported on private VLANs. In addition, keep in mind that PACLs do not support the log RACL keyword. Port ACLs perform access control on all traffic entering the specified Layer 2 port and apply only to ingress traffic on the port. When implementing PACLs, it is important to remember that they do not affect Layer 2 control packets, such as CDP packets, that are received on the port. Additionally, keep in mind that PACLs are supported only in hardware and do not apply to packets that are processed in software. When you create a Port ACL, an entry is created in the ACL TCAM. PACLs can be configured as either standard or extended IP ACLs or MAC ACLs. This allows you to filter IP traffic by using IP access lists and non-IP traffic by using MAC addresses. As is the case with RACLs, you can use the show ip access-lists command to troubleshoot PACL Layer 3 filtering problems. In the event that MAC filtering has been implemented, use the show mac access-group command to view applied MAC ACLs on a per-interface basis as follows: Switch#show mac access-group interface FastEthernet0/1 Interface FastEthernet0/1: Inbound access-list is TSHOOT Outbound access-list is not set To view the configured MAC ACL, use the show access-lists command as follows: 331 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Switch#show access-lists Extended MAC access list TSHOOT deny host 0000.0c92.04b6 any permit any any VLAN Access Control Lists (VACLs) operate in a similar manner to Router ACLs but are a means to apply access control to packets bridged within a VLAN or routed between VLANs. Unlike RACLs, which are applied on an inbound or outbound basis, VACLs have no sense of direction and therefore apply to traffic at both ingress and egress. Within a VLAN, packets arriving on the Layer 2 interface have the VACL processed on ingress and egress. Used alone, VACLs can be configured to filter both bridged and routed packets. Additionally, they can also be used in conjunction with RACLs to filter both bridged and routed traffic. Troubleshoot VACL issues using the show vlan filter <access-map|vlan> suite of commands. The show vlan filter access-map <name> command prints information showing the VLAN the specified VLAN access map has been applied to as illustrated below: Switch#show vlan filter access-map MY-VACL-MAP VLAN Map MY-VACL-MAP is filtering VLANs: 2-3 The show vlan filter vlan <ID> command prints information on the VACL that is applied to the specified VLAN as illustrated in the following output: Switch#show vlan filter vlan 2 Vlan 2 has filter MY-VACL-MAP. In addition to the show vlan filter commands, you can also use the show vlan access-map command to view the configured parameter for all VACLs or for the specified VACL: Switch#show vlan access-map Vlan access-map “MY-VACL-MAP” Match clauses: ip address: ALLOW-UDP Action: forward Vlan access-map “MY-VACL-MAP” Match clauses: Action: forward 10 20 A common issue when implementing RACLs, PACLs, and VACLs is attempting to apply conflicting ACLs. However, care should be taken when attempting to implement different types of ACLs on the switch. If a conflicting VACL and PACL configuration is implemented, the switch will log an error message similar to the following error message: 332 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES *Mar 2 10:27:10.411: %FM-3-CONFLICT: VLAN Map MY-VACL-MAP conflicts with port ACLs Likewise, if a conflicting RACL and PACL configuration is implemented, the switch will log an error message similar to the following error message: *Mar ACLs 2 10:41:01.399: %FM-3-CONFLICT: Port ACL 100 conflicts with input router Finally, if a conflicting RACL and VACL configuration is implemented, the switch will log an error message similar to the following error message: *Mar 2 10:41:01.399: %FM-3-CONFLICT: Port ACL 100 conflicts with VLAN filters In summation, when implementing RACLs, PACLs, and VACLs, keep the following in mind: • RACLs are applied to Layer 3 interfaces, which includes switch SVIs • RACLs are applied in the inbound and outbound directions • PACLs are applied only on Layer 2 ports • PACLs are applied in the inbound direction • PACLs do not affect management protocols, such as CDP • PACLs can filter based on Layer 2 or Layer 3 addresses • VACLs can be applied to VLANs; they cannot be applied to interfaces • VACLs can be used to filter both bridged and routed traffic • VACLs have no sense of direction and filter inbound and outbound traffic • VACLs can be used with RACLs to filter routed and bridged traffic Catalyst Switch Port Security Port security is another Cisco IOS software tool that can be used to protect the data plane. This feature secures the CAM table by limiting the number of MAC addresses that can be learned on a particular port or interface. With the port security feature, the switch maintains a table that is used to identify which MAC address (or addresses) can access which local switch port. The primary purpose of the port security feature is to protect against CAM table overflow or MAC address flooding attacks. However, the same feature can also be used to protect against MAC spoofing attacks, which were described earlier in this chapter in the section on DHCP Snooping. CAM table overflow or MAC address flooding attacks work by flooding the switch with a large number of randomly generated invalid source and destination MAC addresses until the CAM table fills up and the switch is no longer able to accept new entries. In such situations, the switch effectively turns into a hub and simply begins to broadcast all newly received frames to all ports on the switch, essentially turning the VLAN into one big Broadcast domain. 333 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The primary purpose of CAM table overflow or MAC address flooding attacks is to get the switch to go into a ‘fail-open’ state, which essentially means that all traffic is flooded or transmitted out of all ports. In such cases, the attacker is able to capture all data transiting the switch, as they can see all packets that are being sent by the switch. The port security feature can be used to specify which specific MAC address is permitted access to a switch port, as well as to limit the number of MAC addresses that can be supported on a single switch port. The methods of port security implementation described in this section are as follows: • Static secure MAC addresses • Dynamic secure MAC addresses • Sticky secure MAC addresses Static secure MAC addresses are statically configured by network administrators and are stored in the MAC address table, as well as in the switch configuration. When static secure MAC addresses are assigned to a secure port, the switch will not forward frames that do not have a source MAC address that matches the configured static secure MAC address or addresses. Dynamic secure MAC addresses are dynamically learned by the switch and are stored in the MAC address table. However, unlike static secure MAC addresses, dynamic secure MAC address entries are removed from the switch when the switch is reloaded or powered down. These addresses must then be re-learned by the switch when it boots up again. Sticky secure MAC addresses are a mix of static secure MAC addresses and dynamic secure MAC addresses. These addresses can be learned dynamically or configured statically and are stored in the MAC address table, as well as the switch configuration (NVRAM). This means that when the switch is powered down or rebooted, it will not need to discover the MAC address dynamically again because it will already be saved in the configuration file. Once port security has been enabled, administrators can define the actions the switch will take in the event of a port security violation. Cisco IOS software allows administrators to specify the following four actions to take when a violation occurs: • Protect • Shutdown (default) • Restrict • Shutdown VLAN 334 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES The protect option forces the port into a protected port mode. In this mode, all Unicast or Multicast frames with unknown source MAC addresses will simply be discarded by the switch. When the switch is configured to protect a port, it will not send out a notification when operating in protected port mode, meaning that administrators would never know when any traffic was prevented by the switch port operating in this mode. The shutdown option places a port in an errdisabled state when a port security violation occurs. The corresponding port LED on the switch is also turned off when a port security violation occurs and this configured action mode is used. In shutdown mode, the switch sends out an SNMP trap and a syslog message, and the violation counter is incremented. This is the default action taken when port security is enabled on an interface. The restrict option is used to drop packets with unknown MAC addresses when the number of secure MAC addresses reaches the administrator-defined maximum limit for the port. In this mode, the switch will continue to restrict additional MAC addresses from sending frames until a sufficient number of secure MAC addresses is removed or the number of maximum allowable addresses is increased. As is the case with the shutdown option, the switch sends out an SNMP trap and a syslog message, and the violation counter is incremented. The shutdown VLAN option is similar to the shutdown option; however, this options shuts down a VLAN instead of the entire switch port. This configuration could be applied to ports that have more than one single VLAN assigned to them, such as a voice VLAN and a data VLAN, for example, as well as to trunk links on the switches. When troubleshooting port security, it is important to check the configuration that has been implemented by first using the show running-config interface <name> command. As stated earlier in this guide, default port security configuration parameters can cause operational issues with other features, such as FHRPs (i.e., HSRP, VRRP, and GLBP) because only a single MAC address is allowed per port. This can be validated via the show port-security interface <name> command as illustrated below: Switch#show port-security interface FastEthernet0/2 Port Security : Enabled Port Status : Secure-down Violation Mode : Shutdown Aging Time : 0 mins Aging Type : Absolute SecureStatic Address Aging : Disabled Maximum MAC Addresses : 1 Total MAC Addresses : 0 Configured MAC Addresses : 0 335 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Sticky MAC Addresses Last Source Address:Vlan Security Violation Count : 0 : 0000.0000.0000:0 : 0 When looking at the output of this command, it is important to understand the information that is printed by the switch. The Port Status field indicates the operational state of the port (i.e., whether the port is up or down). In the example above, the port is down, which could be due to Layer 1 issues, or because the shutdown command was issued under the port, or because the switchport port-security command has not been issued under the interface or port. The Violation Mode field indicates the configuration violation mode. The default mode is shutdown. The Aging Time and Aging Type fields specify the aging time and type parameters. By default, secure MAC addresses will not be aged out and will remain in the switch MAC table until the switch is powered off. However, this default behavior may be adjusted by configuring aging values for dynamic and secure static MAC addresses. The valid aging time range is 0 to 1440 minutes. The aging type specifies how secure addresses are aged. This can be either an absolute value or following a configured period of inactivity. The absolute mechanism causes the secured MAC addresses on the port to age out after a fixed specified time. All references are flushed from the secure address list after the specified time and the address must then be relearned on the switch port. Once relearned, the timer begins again and the process is repeated as often as has been defined in the configured timer values. This is the default aging type for secure MAC addresses. The inactivity time, also referred to as the idle time, causes secured MAC addresses on the port to age out if there is no activity (i.e., frames or data) received from the secure addresses learned on the port for the specified time period. The Maximum MAC Addresses field specifies the number of allowed secure MAC addresses per port. The default is one and the maximum value depends on the switch platform. The Total MAC Addresses field indicates the current total MAC addresses learned on the port. The Configured MAC Addresses field specifies the number of statically configured secure addresses on the port. The Sticky MAC Addresses field specifies the number of sticky secure MAC addresses configured on the port. The Last Source Address:Vlan field specifies the MAC address of the last secure MAC address learned on the port. This is applicable only when port security is configured on a trunk link. Finally, the Security Violation Count field specifies the number of security violations on the port. To reinforce what has been discussed in this section further, consider the following output: Switch#show port-security interface FastEthernet0/2 Port Security : Enabled Port Status : Secure-up Violation Mode : Restrict 336 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES Aging Time Aging Type SecureStatic Address Aging Maximum MAC Addresses Total MAC Addresses Configured MAC Addresses Sticky MAC Addresses Last Source Address:Vlan Security Violation Count : : : : : : : : : 10 mins Inactivity Disabled 10 6 1 5 0000.0000.0000:0 0 From the port security interface output that is printed above, we can determine the following: • The interface is up, and the switchport port-security command was issued under the interface. This is reflected in the Secure-up port status. • The switchport port-security violation restrict command was issued under the interface because the default violation mode is Shutdown. • The switchport port-security aging time 10 and switchport port-security aging type inactivity commands were issued under the interface because the aging time default is 0 minutes and the aging type default is absolute. • The switchport port-security maximum 10 command was issued under the interface since by default only one MAC address is permitted when port security is enabled. • Referencing the total MAC addresses, we can determine that the switchport port-security mac-address sticky command was issued and specified five secure sticky addresses, while the switchport port-security mac-address was issued and specified one secure address because, by default, these addresses are not defined. • Finally, we can determine that no security violations have been detected on the interface or port as the counter still has a value of 0. Private VLANs Private VLANs (PVLANs) prevent inter-host communication by providing port-specific security between adjacent ports within a VLAN across one or more switches. Access ports within PVLANs are allowed to communicate only with certain designated router ports, which are typically those connected to the default gateway for the VLAN. Both normal VLANs and private VLANs can coexist on the same switch; however, unlike normal VLANs, private VLANs allow for the segregation of traffic at Layer 2. This effectively transforms a traditional Broadcast segment into a non-Broadcast multi-access segment. 337 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The private VLAN feature uses three different types of ports: community, isolated, and promiscuous. Community PVLAN ports are logically combined groups of ports in a common community that can pass traffic among themselves and with promiscuous ports. Ports are separated at Layer 2 from all other interfaces in other communities or isolated ports within their PVLAN. Isolated PVLAN ports cannot communicate with any other ports within the PVLAN. However, isolated ports can communicate with promiscuous ports. Traffic from an isolated port can be forwarded only to a promiscuous port and no other port. Promiscuous PVLAN ports can communicate with any other ports, including community and isolated PVLAN ports. The function of the promiscuous port is to allow traffic between ports in a community of isolated VLANs. Promiscuous ports can be configured with switch ACLs to define what traffic can pass between these VLANs. It is important to know that only one promiscuous port is allowed per PVLAN, and that port serves the community and isolated VLANs within that PVLAN. Because promiscuous ports can communicate with all other ports, this is the recommended location to place switch ACLs to control traffic between the different types of ports and VLANs. Isolated and community port traffic can enter or leave switches via trunk links, because trunks support VLANs carrying traffic among isolated, community, and promiscuous ports. Hence, PVLANs are associated with a separate set of VLANs that are used to enable PVLAN functionality in Cisco Catalyst switches. The three types of VLANs used in PVLANs are as follows: 1. The primary VLAN 2. Isolated VLAN 3. Community VLAN Primary VLANs carry traffic from a promiscuous port to isolated, community, and other promiscuous ports within the same primary VLAN. Isolated VLANs carry traffic from isolated ports to a promiscuous port. Ports in isolated VLANs cannot communicate with any other port in the private VLAN without going through the promiscuous port. Community VLANs carry traffic between community ports within the same PVLAN, as well as to promiscuous ports. Ports within the same community VLAN can communicate with each other at Layer 2; however, they cannot communicate with ports in other community or isolated VLANs without going through a promiscuous port. Isolated and community VLANs are typically referred to as secondary VLANs. A private VLAN, therefore, actually contains three elements, which are as follows: 338 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES 1. The PVLAN itself 2. The secondary VLANs (community and isolated) 3. The promiscuous port The community VLAN defines a set of ports that can communicate with each other at Layer 2, as long as they belong to the same community VLAN, but cannot communicate with ports in other community VLANs or isolated VLANs without first going through the promiscuous port. The isolated VLAN defines a set of ports that cannot communicate with any other port within the PVLAN, either another community VLAN port or even a port in the same isolated VLAN, at Layer 2. In order to communicate with ports in either of these VLANs, isolated ports must go through the promiscuous port. Only a single isolated VLAN per PVLAN is allowed. The promiscuous port forwards traffic between ports in community and/or isolated VLANs. Only one promiscuous port can exist within a single PVLAN; however, this port can serve all the community and isolated VLANs in the PVLAN. ACLs may be applied to the promiscuous port to define the traffic that is allowed to pass between these different VLANs. When troubleshooting PVLANs, first verify that the implemented configuration is correct. For example, if users are unable to access other devices in their or other VLANs, ensure that they are using the correct default gateway and the default gateway has been assigned to the promiscuous port, because this port allows for both intra- and inter-PVLAN communication. In addition to verifying PVLAN configuration, perform additional basic checks, such as verifying that the ports are configured and operational, for example. IEEE 802.1x Port-Based Authen ca on Identity Based Networking Services (IBNS) provides identity-based network access control and policy enforcement at the switch port level. The IBNS solution extends network access security based on the 802.1x technology, Extensible Authentication Protocol (EAP) technologies, and the Remote Authentication Dial-In User Service (RADIUS) security server service. IEEE 802.1x is a protocol standard framework for both wired and wireless Local Area Networks that authenticates users or network devices and provides policy enforcement services at the port level to provide secure network access control. 802.1x is an IEEE standard for access control and authentication that provides a means for authenticating users who want to gain access to the network and placing them into a pre-determined VLAN, effectively granting them certain access rights to the network. In simpler terms, 802.1x mitigates against rogue or unknown devices from gaining unauthorized access to either the wired or the wireless network. The 802.1x protocol provides the definition to encapsulate the transport of EAP messages at the Data Link Layer over any PPP or IEEE 802 media (e.g., Ethernet, FDDI, or Token Ring) through 339 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I the implementation of a port-based network access control to a network device. EAP messages are communicated between an end device, referred to as a supplicant, and an authenticator, which can be either a switch or a wireless access point. The authenticator relays the EAP messages to the authentication server (e.g., a Cisco ACS server) via the RADIUS server protocol. The three primary components (or roles) in the 802.1x authentication process are as follows: • Supplicant or client • Authenticator • Authentication server An IEEE 802.1x supplicant or client is simply an 802.1x-compliant device, such as a workstation, a laptop, or even an IP phone, with software that supports the 802.1x and EAP protocols. The supplicant client sends an authentication request to access the LAN via the connected authenticator device (e.g., the access switch) using EAP. An 802.1x authenticator is a device that enforces physical access control to the network based on the authentication status (i.e., permit or deny) or the supplicant. Examples of an authenticator would be a switch or a router. The authenticator acts as a proxy and relays information between the supplicant and the authentication server. The authenticator receives the identity information from the supplicant via EAP over LAN (EAPOL) frames, which are verified and then encapsulated into RADIUS protocol format before being forwarded to the authentication server. It is important to remember that the EAP frames are not modified or examined during the encapsulation process, which means that the authentication server must support EAP within the native frame format. When the authenticator receives frames from the authentication server, the RADIUS header is removed, leaving only the EAP frame, which is then encapsulated in the 802.1x format. These frames are then sent back to the supplicant or client. The authentication server is the database policy software, such as Cisco Secure ACS, that supports the RADIUS server protocol and performs authentication of the supplicant that is relayed by the authenticator via the RADIUS client-server model. The authentication server validates the identity of the client and then notifies the authenticator whether the client is allowed or denied access to the network. Based on the response from the authentication server, the authenticator relays this information back to the supplicant. It is important to remember that during the entire authentication process, the authentication server remains transparent to the client because the supplicant is communicating only with the authenticator. The 340 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES RADIUS protocol with EAP extensions is the only truly compliant, supported authentication server when configuring 802.1x port-based authentication. When configuring 802.1x authentication, you must enable AAA on the switch or router using the aaa new-model global configuration command. Next, create or use the default 802.1X authenti- cation method list and specify RADIUS server information by issuing the aaa authentication dot1x [method-list|default] group [name|radius] global configuration command. NOTE: As per Cisco online documentation, when configuring 802.1x authentication, “The only method that is truly 802.1X-compliant is the group radius method, in which the client data is validated against a RADIUS authentication server.” Cisco IOS software also allows you to use the local database and enable password or line passwords for authentication. Following that, proceed and configure RADIUS server parameters (e.g., keys and ports) via the radius-server host global configuration command for an individual server or the aaa group server radius global configuration command for a RADIUS group server. Next, globally enable IEEE 802.1x authentication on the switch using the dot1x system-auth-control global configuration command. Finally, enable 802.1x port-based authentication on desired switch ports using the dot1x portcontrol {auto|force-authorized |force-unauthorized} interface configuration com- mand. The following configuration example illustrates how to configure 802.1x port-based authentication on a switch interface: Switch(config)#aaa new-model Switch(config)#aaa authentication dot1x default group radius Switch(config)#radius-server host 10.1.1.254 auth-port 1812 key t5h00t!2010 Switch(config)#dot1x system-auth-control Switch(config)#interface range FastEthernet0/23 - 24 Switch(config-if-range)#switchport mode access Switch(config-if-range)#dot1x port-control auto Switch(config-if-range)#exit When troubleshooting 802.1x port-based authentication, troubleshooting targets include the client (supplicant), authenticator, and authentication server. As a network engineer, your primary focus would be on the authenticator, as this is the device under your administration on which the portbased authentication configuration is implemented. When troubleshooting 802.1x authentication issues, verify that 802.1x is enabled on the switch port by checking the device configuration or by using the show dot1x interface <name> command, the output of which is shown below: 341 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Switch#show dot1x interface FastEthernet0/24 Dot1x Info for FastEthernet0/24 ----------------------------------PAE = AUTHENTICATOR PortControl = AUTO ControlDirection = Both HostMode = SINGLE_HOST ReAuthentication = Disabled QuietPeriod = 60 ServerTimeout = 30 SuppTimeout = 30 ReAuthPeriod = 3600 (Locally configured) ReAuthMax = 2 MaxReq = 2 TxPeriod = 30 RateLimitPeriod = 0 In addition, check that the port authorization state is set to auto. If the port authorization state is set to force-unauthorized, the port will remain in the unauthorized state, ignoring all attempts by the client to authenticate. In other words, the switch or router cannot provide authentication services to the client through the interface or port. Another thing to check is that 802.1x port-based authentication has been enabled globally on the switch. Again, you can check the configuration, parsing it for the dot1x system-auth-control configuration statement, or simply use the show dot1x command to validate this configuration as shown below: Switch#show dot1x Sysauthcontrol Dot1x Protocol Version Critical Recovery Delay Critical EAPOL Enabled 2 100 Disabled In the event that the configuration on the switch or router appears to be correct, verify that there is connectivity between the switch or router and the RADIUS server. You can perform a simple IP connectivity test using the ping utility and additionally use the debug aaa authentication command to perform additional verification, as well as AAA troubleshooting. Trunking Trunk links are used to carry traffic from multiple VLANs. By default, the users residing on one VLAN cannot directly communicate with users in another VLAN without going through a router or Layer 3 interface (e.g., an SVI). Despite this default operation, attackers can use VLAN hopping attacks to bypass a Layer 3 device in order to communicate directly between VLANs. For this reason, it is important to ensure that trunk links are secured. The main objective of VLAN hopping is to compromise a device residing on another VLAN. The two primary methods used to perform VLAN hopping attacks are as follows: 342 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES 1. Switch spoofing 2. Double-tagging In switch spoofing, the attacker impersonates a switch by emulating ISL or 802.1Q signaling, as well as Dynamic Trunking Protocol (DTP) signaling. DTP provides switches with the ability to negotiate the trunking method for the trunk link they will establish between themselves. Double-tagging or double-encapsulated VLAN attacks involve tagging frames with two 802.1Q tags in order to forward the frames to a different VLAN. The embedded hidden 802.1Q tag inside the frame allows the frame to traverse a VLAN that the outer 802.1Q tag did not specify. This is a particularly dangerous attack because it will work even if the trunk port is set to off. While there are no structured troubleshooting steps for VLAN hopping attacks per-se, there are some techniques that you should be familiar with and implement in order to prevent such attacks. These techniques and mitigation methods include the following: • Ensure that the native VLAN used on all the trunk ports is different from the VLAN ID of user access ports. It is best to use a dedicated or isolated VLAN that is specific for each pair of trunk ports, not the default VLAN. • Configure the native VLAN to tag all traffic to prevent the vulnerability of double-tagged 802.1Q frames hopping VLANs. This functionality can be enabled by issuing the vlan dot1q tag native global configuration command. • Disable Dynamic Trunking Protocol on all untrusted ports using the switchport nonegotiate command, effectively preventing automatic trunk configuration. • Alternatively, configure the untrusted ports as access ports using the switchport mode access command. • Place all unused ports in a common unrouted VLAN that is local to the switch. Use a VLAN number that is easily recognizable, such as 666, for example. Cisco IOS Firewall The Cisco IOS Firewall suite can be used to secure the data plane; however, it has been included in its own section, as this topic is not included in any of the other current CCNP study guides. 343 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I CISCO IOS FIREWALL FUNDAMENTALS The final section of this chapter describes the Cisco IOS Firewall suite, which can also be used to protect the data plane. While most network environments typically employ dedicated appliances, such as the Cisco Adaptive Security Appliance (ASA) Firewall, and a Network-based Intrusion Prevention System (NIPS), such as the Cisco IPS 4200 Sensors, Cisco IOS software also provides in-built firewall and intrusion prevention capabilities with which, as a network engineer, you should be familiar. The Cisco IOS Firewall suite provides a single point of protection at the network perimeter, making security policy enforcement an inherent component of the network. The Cisco IOS Firewall is comprised of the following functions and technologies: • Cisco IOS Stateful Packet Inspection • Context-Based Access Control • Intrusion Prevention System • Authentication Proxy • Port-to-Application Mapping • Network Address Translation • Zone-Based Policy Firewall NOTE: Keep in mind that this is not a security exam. While it is an expectation that you are familiar with the basic Cisco IOS Firewall fundamentals, we will not be going into advanced detail on this feature or other related security features. Cisco IOS Stateful Packet Inspec on Cisco IOS Stateful Packet Inspection (SPI) provides firewall capabilities designed to protect networks against unauthorized traffic and to control legitimate business-critical data. Cisco IOS SPI maintains state information and counters of connections, as well as the total connection rate, through the firewall and intrusion prevention software. Stateful firewalls perform SPI or Stateful Inspection and keep track of the state of network connections, such as TCP and UDP streams traveling across them. The Cisco IOS Firewall is a Stateful firewall that uses the inherent Stateful inspection engine of Cisco IOS software for maintaining the detailed session database, which is referred to as the state table. Stateful firewalls are able to hold a significant amount of attributes in their memory for each connection, from start to finish. These attributes, which are known as connection states, may include such details as the IP addresses and port numbers involved in the connection, as well as the sequence numbers of the packets traversing the connection. When running Cisco IOS Classic Fire- 344 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES wall, which is described later in this chapter, you can use the show ip inspect session command to view the state table. Following is a sample output of this command: R1#show ip inspect sessions Established Sessions Session 84362E94 (10.1.1.1:3624)=>(172.1.1.1:80) http SIS_OPEN Context-Based Access Control Context-Based Access Control (CBAC) is a Stateful inspection firewall engine that provides dynamic traffic filtering capabilities. CBAC, which is also known as the Classic IOS Firewall, provides an advanced firewall engine that provides advanced traffic-filtering functionality to Cisco IOS routers. The main features of Context-Based Access Control are as follows: • It protects the internal network from external intrusion or other threats • It provides Denial of Service (DoS) protection • It provides per-application control mechanisms • It examines Layer 3 and Layer 4, as well as Application Layer, information • It maintains state information for every connection • It generates real-time event alert failures and log messages • It provides enhanced audit trail features Context-Based Access Control inspects all traffic that traverses the firewall and maintains state information for all TCP and UDP sessions. This state information is then used to create temporary (dynamic) ACL openings through the firewall to allow returning traffic that was originated internally access. These temporary openings are maintained for the duration of the session. Packets that enter the firewall are subject to inspection only if they first pass the inbound ACL at the input interface and outbound ACL at the output interface. If a packet is denied by the ACL, the router will simply drop it without CBAC inspection. Figure 8-3 below illustrates basic Cisco IOS CBAC operation: Cisco IOS Stateful Firewall Internal – Trusted 10.1.1.1 - SRC External – Untrusted 172.1.1.1 - DST OUTBOUND (Based on static ACL entry) SRC IP SRC Port DST IP DST Port Protocol 10.1.1.1 1500 172.1.1.1 80 TCP INBOUND (Based on dynamic ACL entry) Internal – Trusted 10.1.1.1 - DST SRC IP SRC Port DST IP DST Port Protocol 172.1.1.1 80 10.1.1.1 1500 TCP State Table Fig. 8-3. Understanding CBAC Operation 345 External – Untrusted 172.1.1.1 - SRC C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Referencing Figure 8-3, traffic from the internal (trusted) network is permitted by an ACL that is configured statically on the router, allowing host 10.1.1.1 to communicate with host 172.1.1.1 on the external network (untrusted). This information is stored in the state table. CBAC then creates a dynamic ACL entry, allowing return traffic from external host 172.1.1.1 to internal host 10.1.1.1. This dynamic ACL is maintained only for the duration of the session. Because a dynamic ACL is created for return traffic, the firewall allows only traffic originating from the internal network. In other words, if host 172.1.1.1 simply attempted to initiate a connection to host 10.1.1.1, then the connection would be dropped. You can use the show ip inspect sessions detail command to view detailed session inspection when troubleshooting CBAC problems. Following is a sample output of this command, which illustrates an ICMP Echo request sent by host 10.1.1.1 to 172.1.1.1: R1#show ip inspect sessions detail Established Sessions Session 84362BCC (10.1.1.1:8)=>(172.1.1.1:0) icmp SIS_OPEN Created 00:00:09, Last heard 00:00:06 ECHO request Bytes sent (initiator:responder) [128:128] Out SID 172.1.1.1[0:0]=>10.1.1.1[0:0] on ACL CBAC-ACL In SID 172.1.1.1[0:0]=>10.1.1.1[0:0] on ACL CBAC-ACL (4 matches) Out SID 0.0.0.0[0:0]=>10.1.1.1[3:3] on ACL CBAC-ACL In SID 0.0.0.0[0:0]=>10.1.1.1[3:3] on ACL CBAC-ACL Out SID 0.0.0.0[0:0]=>10.1.1.1[11:11] on ACL CBAC-ACL In SID 0.0.0.0[0:0]=>10.1.1.1[11:11] on ACL CBAC-ACL Referencing the output above, CBAC creates a dynamic ACL entry named CBAC-ACL, which allows return traffic (i.e., the ICMP Echo response) from host 172.1.1.1 to host 10.1.1.1. This dynamic entry is based on the static ACL applied to the internal or trusted router interface. Another useful command when troubleshooting CBAC issues is the show ip inspect interfaces command. This command shows the internal (trusted) and external (untrusted) interfaces. The internal interface should have an inbound CBAC inspection rule applied to it, while the external interface should have an outbound CBAC inspection rule applied to it. This allows the router to inspect traffic ingressing the trusted interface and destined out of the untrusted interface, which in turn allows it to create a dynamic entry for the return traffic, as it was originated behind the trusted interface. Following is a sample output of this command: R1#show ip inspect interfaces Interface Configuration Interface Serial0/0 Inbound inspection rule is not set Outgoing inspection rule is TSHOOT-CBAC icmp alert is on audit-trail is off timeout 10 smtp max-data 20000000 alert is on audit-trail is off timeout 3600 346 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES udp alert is on audit-trail is off timeout 30 Inbound access list is CBAC-ACL Outgoing access list is not set Interface FastEthernet0/0 Inbound inspection rule is TSHOOT-CBAC icmp alert is on audit-trail is off timeout 10 smtp max-data 20000000 alert is on audit-trail is off timeout 3600 udp alert is on audit-trail is off timeout 30 Outgoing inspection rule is not set Inbound access list is not set Outgoing access list is CBAC-ACL Referencing the output above, the inbound inspection rule named TSHOOT-CBAC has been applied in the inbound direction to the FastEthernet0/0 interface (trusted), while the same outbound inspection rule has been applied in the outbound direction to the Serial0/0 interface (untrusted). An ACL named CBAC-ACL is used to permit or deny traffic. NOTE: You are not expected to perform any CBAC configuration in the TSHOOT certification exam. In addition, you are not expected to perform any advanced CBAC troubleshooting. Intrusion Preven on System The Cisco IOS Intrusion Prevention System (IPS) is an inline intrusion detection and prevention sensor that scans packets and sessions flowing through the router to identify any of the Cisco IPS signatures that protect the network from internal and external threats. Some key features of the Cisco IOS Intrusion Prevention System are as follows: • It protects the network from viruses, worms, and a large variety of threats and exploits • It eliminates the need for a standalone IPS device • It provides integrated inline deep-packet inspection • It complements the Cisco IOS Firewall and VPN solutions for superior threat protection • It supports approximately 2000 attack signatures • It uses Cisco IOS routing capabilities to deliver integrated functionality • It enables distributed network-wide threat mitigation • It sends a syslog message or an alarm in SDEE format when a threat is detected NOTE: SDEE (Security Device Event Exchange) specifies the format of messages and protocols used to communicate events generated by security devices. SDEE specifies that events can be transported using HTTP or HTTPS over SSL and TLS protocols. SDEE is the default protocol used by Cisco IPS Sensor software, as well as by the Cisco IOS IPS feature set used on Cisco IOS routers. SDEE can also be used by tools such as Cisco Router and Security Device Manager (SDM) to pull event logs from Cisco IOS software routers. 347 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I When a Cisco IOS router will be acting as an IPS device, it needs to have a place to store the signature files, referred to as Signature Definition Files (SDFs), that it will use to identify malicious traffic. An SDF is a file, usually in XML format, that contains signature definitions that can be used to load signatures on the Cisco IOS router. In most cases, the SDF is located in the router’s Flash Memory; however, Cisco IOS routers also have the capability to reference multiple Signature Definition Files located on network servers, such as on TFTP servers, for example, for increased signature coverage. NOTE: IOS IPS troubleshooting is beyond the scope of the current TSHOOT certification exam and will not be included in this guide. Authen ca on Proxy The Authentication Proxy feature, also known as Proxy Authentication, allows administrators to enforce security policy on a per-user basis. With this feature, administrators can authenticate and authorize users on a per-user policy with access control customized to an individual level. The Authentication Proxy feature intercepts HTTP or HTTPS sessions and prompts the user for a username and password if the user has not been previously authenticated. Authentication Proxy configuration and detailed knowledge is beyond the scope of the TSHOOT requirements and will not be described in detail in this guide. Port-to-Applica on Mapping Port-to-Application Mapping (PAM) allows administrators to customize TCP port numbers and UDP port numbers for network services or applications to non-standard ports. For example, administrators could use PAM to configure standard HTTP traffic, which uses TCP port 80 by default, to use TCP port 8080. PAM is also used by CBAC, which uses this information to examine non-standard Application Layer protocols. PAM configuration and troubleshooting is beyond the scope of the TSHOOT certification exam and will not be described in detail in this guide. Network Address Transla on Network Address Translation (NAT) is used to hide internal addresses, which are typically private address (i.e., RFC 1918 addresses) from networks that are external to the firewall. The primary purpose of NAT is address conservation for networks that use RFC 1918 addressing due to the shortage of globally routable IP (i.e., public) address space. NAT provides a lower level of security by hiding the internal network from the outside world. NAT configuration is described in additional detail later in this guide. Zone-Based Policy Firewall Zone-Based Policy Firewall (ZPF) is a new Cisco IOS Firewall feature designed to replace and address some of the limitations of CBAC, the Classic Firewall. ZPF allows Stateful inspection to be 348 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES applied on a zone-based model, which provides greater granularity, flexibility, scalability, and ease of use over the Classic Firewall. ZPF provides greater granularity, flexibility, scalability, as well as an easy-to-use zone-based security approach. With a zone-based inspection model, varying policies can be applied to multiple groups of hosts connected to the same interface. The security zones used in ZPF establish the security boundaries of the network where traffic is subjected to policy restrictions as it crosses to another zone within the network. As is the case with CBAC, the Cisco IOS ZPF configuration is beyond the scope of the TSHOOT certification exam. For this reason, detailed troubleshooting steps are also not included in this guide. Instead, when troubleshooting ZPF issues, consider the following configuration guidelines and caveats: • A zone must be configured before interfaces can be assigned to the zone. In other words, an interface cannot be assigned to a zone that does not exist. • An interface can be assigned to only one security zone. This same concept is applicable in CBAC. Assigning an interface to multiple zones would result in confusing the router. • All traffic to and from a given interface is implicitly blocked when the interface is assigned to a zone, except traffic to and from other interfaces in the same zone, and traffic to any interface on the router (e.g., Loopback interfaces). • Traffic is implicitly allowed to flow, by default, among interfaces that are members of the same zone. In other words, if two or more interfaces are in the same zone, all hosts connected to those interfaces can communicate with each other by default. • In order to permit traffic to and from a zone-member interface, a policy allowing or inspecting traffic must be configured between that zone and any other zone. • The self zone is the only exception to the default ‘deny all’ policy. The self zone controls traffic sent to the router itself or originated by the router. Therefore, all traffic to any router interface or traffic originated by the router is allowed until explicitly denied. • Traffic cannot flow between a zone member interface and any interface that is not a zone member, by default. Pass, inspect, and drop actions can be applied only between two configured zones. For example, if interface FastEthernet0/0 is a member of Zone A and interface FastEthernet0/1 is not affiliated with any zones, traffic from FastEthernet0/0 cannot flow to FastEthernet0/1, and vice-versa. 349 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Interfaces that have not been assigned to a zone function as classic router ports and can still use classic Stateful inspection/CBAC configuration. However, interfaces that have been configured for zones cannot be configured for CBAC. • If it is required that an interface on the router not be part of the ZBF, it might still be necessary to put that interface in a zone and configure a ‘pass all’ policy, which is sort of a dummy policy, between that zone and any other zone to which traffic flow is desired. Otherwise, that interface will not be able to communicate with other interfaces that have been assigned to zones, and vice-versa, as described earlier. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter: Cisco IOS Security Fundamentals • The communications architecture of all switches and routers is segmented into 3 planes • The communications architecture planes of network devices are as follows: 1. The Management Plane 2. The Control Plane 3. The Forwarding Plane • The management plane is used to manage a device through its connection to the network • The management plane is responsible for management functions • The management plane also coordinates functions among all other the planes • Management protocols are used for device monitoring and CLI access • Management protocols include SNMP, Telnet, HTTP, HTTPS and SSH, NetFlow, Syslog • Console access, i.e. via the Console port, is also used to manage devices • A control plane is a collection of processes that run at the process level on a route processor • All traffic directly or indirectly destined to a router or switch is handled by the control plane • Control plane protocols include routing protocols, as well as Layer 2 protocols • The forwarding or data plane is responsible for the actual forwarding of data • The data plane is typically populated using information derived from the control plane Management Plane Security and Troubleshoo ng • Management plane troubleshooting involves troubleshooting the management protocols • SSH provides a more secure method for access and administration than Telnet • Cisco IOS software requires a VTY password and login enabled for remote management • SSH is enabled by default; however, the following must be configured manually: 350 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES 1. A valid domain name must be configured on the local device 2. Security keys must be generated on the local device • FTP and SCP require additional configuration to allow file copies and transfers from devices • AAA provides the framework that controls and also monitors network access • AAA services can be performed using the local database or an external security server • Two commonly used AAA security protocols are RADIUS and TACACS+ • RADIUS stands for Remote Authentication Dial-In User Service • RADIUS is an open-standard protocol that is distributed in C source code format • RADIUS uses UDP as the Transport layer protocol for client and server communications • RADIUS uses port 1812 for authentication and authorization, and 1813 for accounting • Earlier deployments of RADIUS use port 1645 for authentication and authorization • Earlier deployments of RADIUS use port 1646 for accounting • TACACS+ stands for Terminal Access Controller Access Control System Plus • TACACS+ is a Cisco-proprietary protocol that is used in the AAA framework • TACACS+ uses TCP as a Transport Layer protocol, using TCP port 49 • Unlike RADIUS, TACACS+ separates the three AAA architectures • TACACS+ encrypts data between the user and server; RADIUS encrypts only the password • TACACS+ supports multiple protocols; RADIUS had limited protocol support • AAA services are based on method lists • In AAA, there are two basic types of method lists: named and default method lists Control Plane Security and Troubleshoo ng • Troubleshooting at the control plane is dependent on the specific protocol or technology • Troubleshooting for additional control plane features centers on configuration Forwarding Plane Security and Troubleshoo ng • Troubleshooting data plane issues depends on the technology that has been implemented • Tools that are used to secure the data plane include the following: 1. Router and Switch Access Control Lists 2. Catalyst Switch Port Security 3. Private VLANs 4. IEEE 802.1x Port-based authentication 5. Trunking 6. The Cisco IOS Firewall • Cisco IOS software supports RACLs, PACLs and VACLs on router and switch platforms • RACLs are applied to routed interfaces or SVIs 351 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • RACLs can be applied in either the inbound or outbound direction • PACLs are similar to RACLs but are applied to Layer 2 ports • PACLs do not affect Layer 2 control packets such as Cisco Discovery Protocol • PACLs can only be applied in the inbound direction • PACLs can take the form of either IP ACLs or MAC ACLs • VACLs are similar to RACLs but are applied to VLANs • VACLs have no sense of direction • VACLs can filter on both routed and bridged traffic • Port security mitigates against CAM table overflow or MAC address flooding attacks • Private VLANs prevent inter-host communication • The three types of VLANs used in PVLANs are as follows: 1. The Primary VLAN 2. Isolated VLAN 3. Community VLAN • 802.1x simply mitigates against any rogue devices from gaining unauthorized access • The three primary components (or roles) in the 802.1x authentication process are as follows: 1. Supplicant or Client 2. Authenticator 3. Authentication Server • Trunk links are used to carry traffic from multiple VLANs • The main objective of VLAN hopping is to compromise a device residing on another VLAN • The two primary methods used to perform VLAN hopping attacks are as follows: 1. Switch spoofing 2. Double-tagging Cisco IOS Firewall Fundamentals • The Cisco IOS Firewall is comprised of the following functions and technologies: 1. Cisco IOS Stateful Packet Inspection 2. Context-Based Access Control 3. Intrusion Prevention System 4. Authentication Proxy 5. Port-to-Application Mapping 6. Network Address Translation 7. Zone-Based Policy Firewall • Cisco IOS SPI maintains state information and counters of connections 352 C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES • CBAC provides dynamic traffic filtering capabilities • The main features of Context-Based Access Control (CBAC) are as follows: 1. It protects the internal network from external intrusion or other threats 2. It provides Denial of Service (DoS) protection 3. It provides per-application control mechanisms 4. It examines Layer 3 and Layer 4, as well as Application Layer information 5. It maintains state information for every connection 6. It generates real-time event alert failures and log messages 7. It provides enhanced audit trail features • The Cisco IOS IPS is an inline intrusion detection and prevention sensor • Some key features of the Cisco IOS Intrusion Prevention System are as follows: 1. It protects the network from viruses, worms and a large variety of threats and exploits 2. It eliminates the need for a standalone IPS device 3. It provides integrated inline deep-packet inspection 4. It complements the Cisco IOS Firewall and VPN solutions for superior threat protection 5. It supports about 2000 attack signatures 6. It uses Cisco IOS routing capabilities to deliver integrated functionality 7. It enables distributed network-wide threat mitigation 8. It sends a Syslog message or an alarm in SDEE format when a threat is detected • The Authentication Proxy feature intercepts HTTP or HTTPS sessions • PAM allows administrators to customize TCP or UDP ports numbers for network services • NAT is used to hide internal addresses from networks that are external to the firewall • ZPF allows Stateful inspection to be applied on a zone-based model • ZPF provides greater granularity, flexibility, and scalability than that of CBAC 353 CHAPTER 9 Troubleshoo ng Cisco IOS DHCP and NAT C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I D ynamic Host Configuration Protocol (DHCP) is used by hosts to get initial configuration information, which includes parameters such as IP address, subnet mask, and default gateway, upon boot up. Since each host needs an IP address to communicate in an IP network, DHCP eases the administrative burden of manually configuring each host with an IP address. Network Address Translation (NAT) enables computers on private networks to access resources on the Internet or other public network. NAT is an IETF standard that enables a LAN to use one set of IP addresses for internal traffic, typically private address space as defined in RFC 1918, and another set of addresses for external traffic, typically publicly registered IP address space. The TSHOOT certification exam objectives that are covered in this chapter are as follows: • Troubleshoot a DHCP client and server solution • Troubleshoot NAT While DHCP and NAT are core CCNA requirements, the current TSHOOT certification exam requires that you demonstrate not only a basic understanding of these mechanisms but also that you understand how to troubleshoot them. This chapter will be divided into the following sections: • Understanding DHCP • Troubleshooting DHCP • Understanding NAT • Troubleshooting NAT UNDERSTANDING DHCP As previously stated, the Dynamic Host Configuration Protocol (DHCP) is used to assign hosts IP addressing information dynamically, which includes IP address, subnet mask, default gateway, and additional optional parameters, such as Domain Name Service (DNS) servers, Windows Internet Name Service (WINS) servers, and Network Time Protocol (NTP) server information. DHCP uses UDP port 68. Cisco IOS routers and some switches can be configured as both DHCP clients and DHCP servers. Client States and Message Exchanges DHCP is a client/server protocol wherein the server provides the client dynamic addressing information. The server can be a standalone server or a Cisco IOS router or switch that can provide DHCP server functionality. While clients are typically network hosts, such as workstations, Cisco IOS routers and switches can also be configured as DHCP clients, allowing them to receive addressing information dynamically from the DHCP server. 356 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT DHCP clients transition through a series of states upon initialization. During these phases, the clients and servers exchange different messages. Clients transition through the following states: • Initializing • Selecting • Requesting • Bound • Renewing • Rebinding When a client first boots up, it is in the Initializing state. In this state, the client sends out the DHCPDISCOVER message UDP port 67 (BOOTP server) to the Broadcast address FFFF:FFFF:FFFF (Layer 2)/255.255.255.255 (Layer 3). Because at this point the client has no IP address, the source IP address of this Broadcast will be 0.0.0.0. If a DHCP server exists on the local subnet and is configured and operating correctly, the DHCP server will hear the Broadcast and respond with a DHCPOFFER message using UDP port 68 (BOOTP client). However, if no DHCP server resides on the local subnet, then a DHCP or BOOTP Relay is required to forward the DHCPDISCOVER message to a remote DHCP server. If this functionality is not enabled, the client will not be able to communicate with the server. After the client receives the DHCPOFFER message from the DHCP server, the client then transitions into the selecting state. In the event that multiple DHCP servers have responded with DHCPOFFER messages, the client effectively selects which DHCPOFFER to accept during this state. Most commonly, the client will accept the message from the first server to respond. This DHCPOFFER message contains the initial configuration information for the DHCP client. This information includes parameters such as the IP address, subnet mask, default gateway, and other additional parameters, such as lease duration, renewal time, domain name, DNS server, and WINS server information, for example. The server will send the DHCPOFFER to the Broadcast address but will include the hardware address of the client in the offer, so the client knows that it is the intended destination. In the event that the DHCP server is not on the local subnet, the DHCP server will send the DHCPOFFER as a Unicast packet, on UDP port 67, back to the DHCP or BOOTP Relay Agent from which the DHCPDISCOVER came. The DHCP or BOOTP Relay Agent will then either Broadcast or Unicast the DHCPOFFER on the local subnet on UDP port 68, depending on the Broadcast flag set by the client. 357 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I After receiving the DHCPOFFER, the client moves into the Requesting state. In this state, the client responds to the selected DHCP server (typically the first one it heard from) with a DHCPREQUEST message, which indicates that it is willing to accept the parameters in the DHCPOFFER message. The client does not respond to the other DHCPOFFER messages; instead, the DHCP client simply ignores them, implicitly declining the information received from those servers. The client identifies the selected server by populating the Server Identifier option field with the DHCP server’s IP address. The DHCPREQUEST is also a Broadcast, so all DHCP servers that sent a DHCPOFFER will see the DHCPREQUEST, and each will know whether its DHCPOFFER was accepted or declined. Any additional configuration options that the client requires will be included in the options field of the DHCPREQUEST message. Even though the client has been offered an IP address, it will send the DHCPREQUEST message with a source IP address of 0.0.0.0 because it has not yet received verification that it is clear to use the address. When the DHCP server receives the DHCPREQUEST from the client, it acknowledges it by sending the client a DHCPACK. When the client receives this message from the server, it transitions to the Bound state. The DHCPACK message has a source IP address of the DHCP server, and the destination address is a Broadcast. This message contains all the parameters that the client requested in the DHCPREQUEST message. Before the DHCP client begins using the new address, the DHCP client must calculate the time parameters associated with a leased address, which are Lease Time (LT), Renewal Time (T1), and Rebind Time (T2). The DHCPACK tells the client that it is free to use the provided address to access the network. After this message has been sent, the DHCP server then stores the lease in the database and uniquely identifies it using the client identifier and the associated IP address. Both the client and the server will use this combination of identifiers to refer to the lease. The client identifier is the MAC address of the device plus the media type. The sequence of messages exchanged between server and client during this phase is illustrated in Figure 9-1 below: DHCPDISCOVER DHCPOFFER DHCP Client DHCPREQUEST DHCPACK Fig. 9-1. DHCP Client and Server Message Exchanges 358 DHCP Server C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT After getting a lease from the DHCP server, the client must renew that lease when one-half of the lease time has expired. To do so, the client transitions to the renewing state and sends a DHCPREQUEST message to the server that holds the current lease. Upon receiving this message, the server responds to the client with a DHCPACK message that contains the new lease and any other new configuration parameters that may have been made since the previous lease. For example, this might include an updated DNS server IP address. If the client is unable to reach the server holding the lease, it will attempt to renew the address from any DHCP server if the original DHCP server has not responded to the renewal requests within a specified interval. A client transitions into the Rebinding state if, after it has been allocated addressing information, it is restarted. In this state, the client will specifically request the previously leased IP address using a DHCPREQUEST packet, which will still have the source IP address 0.0.0.0 and the destination Broadcast address 255.255.255.255. If the DHCP server determines that the client can still use the requested IP address, it will either remain silent or send a DHCPACK for the DHCPREQUEST. If the server determines that the client cannot use the requested IP address, it will send a DHCPNACK back to the client. The client will then move to the Initializing state and send a DHCPDISCOVER message. The entire process starts again. NOTE: The DHCPNACK is described in the following section. Addi onal DHCP Exchanges In addition to the messages described in the previous section, there are some other additional DHCP messages that can be sent by the client or the server. These messages include the following: • DHCPNAK or DHCPNACK • DHCPDECLINE • DHCPINFORM • DHCPRELEASE The DHCPNAK or DHCPNACK message is sent by the DHCP server if it is unable to satisfy the client DHCPREQUEST message. When the client receives a DHCPNAK message, or does not receive a response to a DHCPREQUEST message, the client restarts the configuration process by going into the Requesting state. The client will retransmit the DHCPREQUEST at least four times within 60 seconds before restarting the initializing state. 359 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The DHCPDECLINE message is sent when the client discovers that the IP address provided by the server in the DHCPACK message is already in use. The client verifies the availability of the address by sending out ARP requests for the IP address specified in the DHCPACK. If already in use, the client sends the server a DHCPDECLINE message and restarts the configuration process by transitioning into the requesting state. This is typically a rare message, as the DHCP server will typically send out ping packets to ensure that the address provided is available. The DHCPINFORM message is sent by a client to request additional configuration parameters. This may be the case when the client has a manually configured IP address but requires additional information from the DHCP server, such as DNS server information, for example. When a DHCP server receives a DHCPINFORM message, it responds to the client with a DHCPACK message that contains the requested configuration parameters without allocating the client a new IP address. This message is Unicast to the requesting client. Finally, the DHCPRELEASE message is sent by the client when it wishes to release or give up its IP address. This action is typically performed manually by an administrator. As an example, a Windows-based client will send this message after the ipconfig /release command is executed on the command prompt. The client identifies the lease to be released by the use of the Client Identifier field and network address in the DHCPRELEASE message. Understanding the DHCP/BOOTP Relay Agent As was previously stated, Cisco IOS software routers and switches may be configured as DHCP servers and clients. In addition, Cisco IOS software also supports DHCP or BOOTP relay functionality. As we already know, DHCP uses Broadcast messages. This works well when client and server reside within the same Broadcast domain; however, it does present a challenge when the DHCP server is located in a remote subnet. This is because, by default, routers will not forward Broadcast packets. This essentially means that if a router resides between the client and the server, the DHCP messages will never be exchanged between the two. In order to allow clients to communicate with servers on remote subnets, the DHCP or BOOTP relay agent function must be enabled on the router. When enabled, the relay agent will then forward requests on behalf of the client to the server, using its own IP address as the source of those requests. This allows the server to allocate an IP address on the same subnet as the messages received from the relay agent. The DHCP server Unicasts responses to the relay agent. Configuring A Cisco IOS Router or Switch as a DHCP Client Cisco IOS DHCP client configuration functionality requires the implementation of only a single command, which is the ip address dhcp interface configuration command. This command is 360 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT required on the interface that will be receiving configuration information from the DHCP server. The following example shows how you would configure the router as a DHCP client: Router(config)#interface FastEthernet0/0 Router(config-if)#description “Connected To ISP XYZ Cable Modem” Router(config-if)#ip address dhcp Router(config-if)#end Assuming that the router is able to communicate with the DHCP server, when the DHCP server provides the router with the addressing parameters, you will see a message that is similar to the following printed on the console: *Oct 20 02:02:10.592 CST: %DHCP-6-ADDRESS_ASSIGN: Interface FastEthernet0/0 assigned DHCP address 150.1.1.1, mask 255.255.255.0, hostname R1 You can validate whether the device has been configured as a DHCP client using either the show dhcp server or show dhcp lease commands. The show dhcp server command provides infor- mation on DHCP message statistics, such as the number of offers or acknowledgements received, for example. It also provides basic addressing parameters, such as DNS server addresses, the domain name, and the subnet mask assigned by the server. Below is a sample output of the information printed by this command: R1#show dhcp server DHCP server: ANY (255.255.255.255) Leases: 1 Offers: 1 Requests: 1 Acks : 1 Naks: 0 Declines: 0 Releases: 0 Query: 0 Bad: 0 DNS0: 172.16.1.253, DNS1: 172.16.1.254 NBNS0: 172.16.1.254, NBNS1: 0.0.0.0 Subnet: 255.255.255.0 DNS Domain: howtonetwork.net The show dhcp lease command provides additional configuration details, which include the assigned IP address, subnet mask, default gateway, and lease duration, among other things. Following is a sample output of the information that is printed by this command: R1#show dhcp lease Temp IP addr: 150.1.1.1 for peer on Interface: FastEthernet0/0 Temp sub net mask: 255.255.255.0 DHCP Lease server: 150.1.1.2, state: 3 Bound DHCP transaction id: 191F Lease: 691200 secs, Renewal: 345600 secs, Rebind: 604800 secs Temp default-gateway addr: 150.1.1.254 Next timer fires after: 3d23h Retry count: 0 Client-ID: cisco-000c.cea7.f3a0-Fa0/0 Client-ID hex dump: 636973636F2D303030632E636561372E 663361302D4661302F30 Hostname: R1 361 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I NOTE: You can also use the show ip interface <name> command to determine whether the interface has derived its IP address from a DHCP server as illustrated in the output below: R1#show ip interface FastEthernet0/0 FastEthernet0/0 is up, line protocol is up Internet address is 150.1.1.1/24 Broadcast address is 255.255.255.255 Address determined by DHCP MTU is 1500 bytes Helper address is not set Directed broadcast forwarding is disabled Outgoing access list is not set Inbound access list is not set Proxy ARP is enabled Local Proxy ARP is disabled ... [Truncated Output] Configuring A Cisco IOS Router or Switch as a DHCP Server While quite straightforward, the configuration of the Cisco IOS DHCP server function requires more steps than when configuring a router or a switch as a DHCP client. The following sequence of steps is required when configuring a router or switch as a Cisco IOS DHCP server: • Exclude the IP addresses that you do not want the Cisco IOS DHCP server to assign to clients using the ip dhcp excluded-address <starting address> <ending address> global configuration command. By default, Cisco IOS DHCP server functionality assumes that all IP addresses specified in the pool are available for assigning and will begin assigning addresses from the bottom IP address to the top IP address (e.g., from .1 to .254). This order cannot be changed. NOTE: By default, the Cisco IOS DHCP server will ping a pool IP address twice before it will assign it to a client. If the ping is unanswered, the DHCP server will assign the address to a client because it assumes that it is available. However, while this does minimize the probability of duplicate addresses being assigned to a client, keep in mind that some devices (e.g., servers) residing on the subnet may have a firewall running that blocks ping packets. Therefore, it is quite possible that a client could be assigned an address already manually assigned to another such device because the Cisco IOS DHCP server did not receive a response from the device. It is therefore recommended that all statically assigned addresses are excluded from the pool. • Configure the DHCP pool using the ip dhcp pool <name> global configuration command. Each individual DHCP pool must have a unique name. The device then transitions to DHCP pool configuration mode. 362 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT • In DHCP pool configuration mode, next configure the network number and mask of the DHCP address pool using the network <network> <mask> or network <network> /<prefix-length> DHCP pool configuration command. Both options are acceptable and both perform the same function. • In DHCP pool configuration mode, specify the IP address of the default gateway using the default-router <address 1…address 8> DHCP pool configuration command. You can specify up to eight different addresses in a single configuration line. Following this core configuration, you can configure additional parameters, such as DNS servers, WINS servers, domain name, and lease duration, for example. In DHCP pool configuration mode, you can specify DNS servers for the pool using the dns-server <address 1…address 8> DHCP pool configuration command. You can specify up to eight different addresses in a single configuration line. The WINS server information can be specified by issuing the netbios-name-server <address 1…address 8> DHCP pool configuration command. Again, you can specify up to eight different addresses in a single configuration line. The domain name for the client can be specified using the domain-name <name> DHCP pool configuration command. Finally, you can change the default one day lease duration used by Cisco IOS DHCP servers via the lease <days [hours] [minutes]|infinite> DHCP pool configuration command. If the infinite keyword is speci- fied, the lease for that pool will never expire. The following configuration example illustrates how to configure two DHCP pools on the same router. The configuration parameters for the first pool are illustrated below: Excluded Address(es) Pool Name Subnet/Mask Gateway(s) DNS Server(s) WINS Server(s) Lease 10.1.1.1 – 10.1.1.9 POOL-A 10.1.1.0/24 10.1.1.1 172.16.1.252, 172.16.1.253, 172.16.1.254 172.16.1.253, 172.16.1.254 8 days The configuration parameters for the second pool are illustrated below: Excluded Address(es) Pool Name Subnet/Mask Gateway(s) DNS Server(s) WINS Server(s) Lease 10.2.2.1 POOL-B 10.2.2.0/29 10.2.2.1 172.16.1.254 N/A 8 hours 363 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The configuration for these two pools is implemented on the router as follows: R1(config)#ip dhcp excluded-address 10.1.1.1 10.1.1.9 R1(config)#ip dhcp excluded-address 10.2.2.1 R1(config)#ip dhcp pool POOL-A R1(dhcp-config)#network 10.1.1.0 /24 R1(dhcp-config)#default-router 10.1.1.1 R1(dhcp-config)#dns-server 172.16.1.252 172.16.1.253 172.16.1.254 R1(dhcp-config)#netbios-name-server 172.16.1.253 172.16.1.254 R1(dhcp-config)#lease 8 0 0 R1(dhcp-config)#exit R1(config)#ip dhcp pool POOL-B R1(dhcp-config)#network 10.2.2.0 255.255.255.248 R1(dhcp-config)#default-router 10.2.2.1 R1(dhcp-config)#dns-server 172.16.1.254 R1(dhcp-config)#lease 0 8 0 R1(dhcp-config)#exit Following this configuration, you can also use the show ip dhcp pool command to view the configured DHCP pool parameters, as well as pool address allocation, as follows: R1#show ip dhcp pool Pool POOL-A : Utilization mark (high/low) : 100 / 0 Subnet size (first/next) : 0 / 0 Total addresses : 254 Leased addresses : 2 Pending event : none 1 subnet is currently in the pool : Current index IP address range 10.1.1.1 10.1.1.1 - 10.1.1.254 Leased addresses 2 Pool POOL-B : Utilization mark (high/low) : 100 / 0 Subnet size (first/next) : 0 / 0 Total addresses : 6 Leased addresses : 0 Pending event : none 1 subnet is currently in the pool : Current index IP address range 10.2.2.1 10.2.2.1 - 10.2.2.6 Leased addresses 0 Additionally, you can use the show ip dhcp binding command to view the DHCP binding database on the local device. This command prints information that includes the DHCP client IP and hardware (MAC) addresses, as well as the lease expiration time and date, as illustrated in the following output: R1#show ip dhcp binding Bindings from all pools not associated with VRF: IP address Client-ID/ Lease expiration 364 Type C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT 10.1.1.10 10.1.1.11 Hardware address/ User name 0063.6973.636f.2d30. 3030.632e.6365.6137. 2e66.3361.302d.4661. 302f.30 0100.24e8.f57e.a2 Oct 28 2010 02:03 AM Automatic Oct 28 2010 02:12 AM Automatic Impor ng DHCP Op ons In some cases, a router or switch may be configured as both a client and a server. This is common when the device has a Broadband connection (e.g., DSL or cable) and is also providing addressing information to hosts connected to the LAN. The Cisco IOS DHCP server import and autoconfiguration feature is enabled by issuing the import all DHCP pool configuration command. When this command is issued, the pool under which it is configured will import the DHCP option parameters into the DHCP server database. These options include parameters such as the DNS and WINS server information, as well as the domain name. The following configuration example illustrates how to configure a router that is acting as both a DHCP client and server to import DHCP option parameters into the DHCP pool: R1(config)#ip dhcp excluded-address 10.3.3.1 10.1.1.5 R1(config)#ip dhcp pool POOL-C R1(dhcp-config)#network 10.3.3.0 255.255.255.0 R1(dhcp-config)#default-router 10.3.3.1 R1(dhcp-config)#import all R1(dhcp-config)#exit R1(config)#interface FastEthernet0/0 R1(config-if)#description ‘Connected To Internal LAN’ R1(config-if)#ip address 10.3.3.1 255.255.255.0 R1(config-if)#exit R1(config)#interface FastEthernet0/1 R1(config-if)#description ‘Connected To The ISP’ R1(config-if)#ip address dhcp R1(config-if)#exit This configuration can be validated using the show ip dhcp import command as follows: R1#show ip dhcp import Address Pool Name: POOL-C Domain Name Server(s): 172.16.1.253 172.16.1.254 NetBIOS Name Server(s): 172.16.1.252 Domain Name Option: howtonetwork.net These imported parameters are then passed on to the client’s assigned addressing information from the configured local pool named POOL-C. 365 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Configuring A Cisco IOS Router or Switch as a DHCP Relay Agent Cisco IOS software routers and switches can be configured as DHCP relay agents, allowing the hosts connected to the local LANs they serve to acquire addressing information from remote DHCP servers. This functionality is enabled via the ip helper-address <address> interface configuration command under the inside or internal interface, which resides on the same subnet as the hosts on the local network. In other words, this command should be configured under the interface that will be receiving the Broadcasts from the host. You can specify this command and specify multiple server addresses for high availability. In the event that multiple servers are specified, Cisco routers forward the DHCPDISCOVER message to all the helper addresses configured under the interface. These messages are Unicast to the servers. By default, the ip helper-address will forward the following UDP Broadcasts: • Trivial File Transfer Protocol (TFTP) (port 69) • DNS (port 53), time service (port 37) • NetBIOS name server (port 137) • NetBIOS datagram server (port 138) • Boot Protocol (DHCP/BOOTP) client and server datagrams (ports 67 and 68) • Terminal Access Control Access Control System (TACACS) service (port 49) • IEN-116 name service (port 42) However, this command can be configured to forward any UDP Broadcast based on UDP port number. While supported, it should be noted, however, that this is not recommended, as forwarding Broadcasts from one subnet to the Broadcast address of another subnet increases Broadcast flooding, which can have an adverse impact on network and device performance. While the ip helper-address command will forward the default list of UDP Broadcasts listed in the previous section, you can also use the ip forward-protocol global configuration command to modify the UDP Broadcasts that the router or switch will forward. This command can be used to remove certain Broadcasts or even include others that are not forwarded, by default. For example, assume you wanted to forward BOOTP/DHCP Broadcasts only and no others to the specified servers. In this case, configure the device as follows: R1(config)#no ip forward-protocol udp 69 R1(config)#no ip forward-protocol udp 37 R1(config)#no ip forward-protocol udp 137 R1(config)#no ip forward-protocol udp 138 R1(config)#no ip forward-protocol udp 49 R1(config)#no ip forward-protocol udp 42 R1(config)#interface FastEthernet0/0 R1(config-if)#ip helper-address 172.17.1.254 366 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT The configuration above prevents the router from forwarding all other default Broadcasts, except for the Boot Protocol (DHCP/BOOTP) client and server datagrams, which use UDP ports 67 and 68. Additionally, the ip forward-protocol command can be used to forward additional Broadcasts in addition to the default ports. For example, to configure the router to forward UDP port 1812 in addition to the default ports, you would issue the following configuration: R1(config)#ip forward-protocol udp 1812 R1(config)#interface FastEthernet0/0 R1(config-if)#ip helper-address 172.17.1.254 You can verify the helper addresses specified under the interface by either checking the device configuration or using the show ip interfaces <name> command as follows: R1#show ip interface FastEthernet0/0 FastEthernet0/0 is up, line protocol is up Internet address is 150.1.1.3/24 Broadcast address is 255.255.255.255 Address determined by DHCP MTU is 1500 bytes Helper addresses are 172.16.1.254 172.17.1.254 172.18.1.254 Directed broadcast forwarding is disabled Outgoing access list is not set ... [Truncated Output] Additionally, you can use the show ip helper-address <interface> command to view all configured helper addresses under a specific interface, or under all interfaces on the device, if you do not include the interface argument as illustrated in the following output: R1#show ip helper-address Interface FastEthernet0/0 FastEthernet0/1 Helper-Address 172.16.1.254 172.17.1.254 172.18.1.254 172.20.1.254 172.21.1.254 172.22.1.254 367 VPN 0 0 0 0 0 0 VRG Name None None None None None None VRG State Unknown Unknown Unknown Unknown Unknown Unknown C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I TROUBLESHOOTING DHCP There are numerous reasons that can result in DHCP problems. However, as with any other issue, the most common reasons are due to device misconfigurations. However, additional problems can also be caused by any of the following: • NIC compatibility issue or DHCP feature issue • Faulty NIC or improper NIC driver installation • Operating System behavior or defect • Spanning Tree issues • CDP is disabled on ports connected to IP phones • DHCP server/DHCP relay information option incompatibility • Cisco IOS DHCP/BOOTP software bugs Device misconfigurations are the most common reason for DHCP problems. Common errors that may result in DHCP problems include the following: • The Cisco IOS DHCP service is disabled • Using secondary addressing • Wrong DHCP configuration parameters By default, the Cisco IOS DHCP service is enabled via the service dhcp global configuration command. However, it is possible that the service may have been disabled when disabling unused services if the device was not previously used as a DHCP server. If you have verified the device configuration and everything appears to be correct, but the Cisco IOS DHCP server is not leasing addresses, check the configuration to ensure that the DHCP service is enabled. Secondary addressing is also another common reason for DHCP problems. By default, DHCP has a limitation in that the Reply packets are sent only if the Request packet is received from the interface configured with the primary IP address. If you have secondary subnets assigned to a router interface and a Cisco IOS DHCP server, or a Cisco IOS DHCH relay agent is configured to forward DHCP Broadcasts to a remote server, only addresses from the pool included in the primary network will be assigned to requesting clients. If you need to assign addresses to hosts from different subnets, you must configure additional interfaces or use subinterfaces instead. Incorrect DHCP configuration parameters may include incorrect network statements when configuring the DHCP pool. In the event that the network statement does not include any local device interfaces, no addresses will be assigned, even by the local Cisco IOS DHCP server. Consider the following basic configuration for example: 368 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT R1(config)#interface FastEthernet0/0 R1(config-if)#ip add 192.168.1.1 255.255.255.0 R1(config-if)#exit R1(config)#ip dhcp pool CLIENT-POOL R1(dhcp-config)#network 192.168.2.0 255.255.255.0 R1(dhcp-config)#default-router 192.168.1.1 R1(dhcp-config)#lease 8 0 0 R1(dhcp-config)#dns-server 10.0.0.254 R1(dhcp-config)#netbios-name-server 10.0.0.254 R1(dhcp-config)#exit In the configuration example above, the FastEthernet0/0 subnet and the DHCP network statement are mismatched. Because of this misconfiguration, the Cisco IOS DHCP server will not be able to assign addresses to local clients sending DHCP packets on FastEthernet0/0. This can be validated by debugging the Cisco IOS DHCP server functionality using the debug ip dhcp server events and debug ip dhcp server packet commands as follows: R1#debug ip dhcp server events DHCP server event debugging is on. R1#debug ip dhcp server packet R1#debug ip dhcp server packet DHCP server packet debugging is on. R1# R1# R1# *Oct 20 16:13:13.828: DHCPD: Sending notification of DISCOVER: *Oct 20 16:13:13.828: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Oct 20 16:13:13.828: DHCPD: remote id 020a0000c0a8010100000000 *Oct 20 16:13:13.828: DHCPD: circuit id 00000000 *Oct 20 16:13:13.828: DHCPD: DHCPDISCOVER received from client 0100.24e8. f57e.a2 on interface FastEthernet0/0. *Oct 20 16:13:13.828: DHCPD: Seeing if there is an internally specified pool class: *Oct 20 16:13:13.828: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Oct 20 16:13:13.828: DHCPD: remote id 020a0000c0a8010100000000 *Oct 20 16:13:13.832: DHCPD: circuit id 00000000 *Oct 20 16:13:13.832: DHCPD: there is no address pool for 192.168.1.1. *Oct 20 16:13:18.828: DHCPD: Sending notification of DISCOVER: *Oct 20 16:13:18.828: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Oct 20 16:13:18.828: DHCPD: remote id 020a0000c0a8010100000000 *Oct 20 16:13:18.828: DHCPD: circuit id 00000000 *Oct 20 16:13:18.828: DHCPD: DHCPDISCOVER received from client 0100.24e8. f57e.a2 on interface FastEthernet0/0. *Oct 20 16:13:18.828: DHCPD: Seeing if there is an internally specified pool class: *Oct 20 16:13:18.828: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Oct 20 16:13:18.828: DHCPD: remote id 020a0000c0a8010100000000 *Oct 20 16:13:18.832: DHCPD: circuit id 00000000 *Oct 20 16:13:18.832: DHCPD: there is no address pool for 192.168.1.1. 369 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I *Oct 20 *Oct 20 *Oct 20 *Oct 20 *Oct 20 f57e.a2 *Oct 20 class: *Oct 20 *Oct 20 *Oct 20 R1# *Oct 20 16:13:26.829: DHCPD: Sending notification of DISCOVER: 16:13:26.829: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 16:13:26.829: DHCPD: remote id 020a0000c0a8010100000000 16:13:26.829: DHCPD: circuit id 00000000 16:13:26.829: DHCPD: DHCPDISCOVER received from client 0100.24e8. on interface FastEthernet0/0. 16:13:26.829: DHCPD: Seeing if there is an internally specified pool 16:13:26.829: 16:13:26.829: 16:13:26.829: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 DHCPD: remote id 020a0000c0a8010100000000 DHCPD: circuit id 00000000 16:13:26.829: DHCPD: there is no address pool for 192.168.1.1. When forwarding DHCP messages to a remote server (i.e., the ip helper-address command has been issued), the gateway forwards all DHCP messages to the configured helper using the primary address on the interface. If a pool has not been configured that includes the source address of the DHCP relay, no addresses will be assigned by the remote DHCP server. For example, if the remote DHCP server were a Cisco IOS DHCP server, and you enabled DHCP server functionality debugging using the debug ip dhcp server events and debug ip dhcp server packet commands, then you would see the following messages printed on the console: R2#debug ip dhcp server events DHCP server event debugging is on. R2#debug ip dhcp server packet DHCP server packet debugging is on. R2# R2# *Mar 3 11:14:18.830: DHCPD: DHCPDISCOVER received from client 0100.24e8.f57e.a2 through relay 192.168.1.1. *Mar 3 11:14:18.830: DHCPD: Seeing if there is an internally specified pool class: *Mar 3 11:14:18.830: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Mar 3 11:14:18.830: DHCPD: circuit id 00000000 *Mar 3 11:14:18.830: DHCPD: there is no address pool for 192.168.1.1. *Mar 3 11:14:23.366: DHCPD: Sending notification of DISCOVER: *Mar 3 11:14:23.366: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Mar 3 11:14:23.366: DHCPD: circuit id 00000000 *Mar 3 11:14:23.370: DHCPD: DHCPDISCOVER received from client 0100.24e8.f57e.a2 through relay 192.168.1.1. *Mar 3 11:14:23.370: DHCPD: Seeing if there is an internally specified pool class: *Mar 3 11:14:23.370: DHCPD: htype 1 chaddr 0024.e8f5.7ea2 *Mar 3 11:14:23.370: DHCPD: circuit id 00000000 *Mar 3 11:14:23.370: DHCPD: there is no address pool for 192.168.1.1. The troubleshooting of NIC issues is platform and hardware dependent. For this reason, we will not delve into any specific details on NIC troubleshooting. However, you can isolate NIC issues by re- 370 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT setting the NIC card, re-installing the driver, or verifying whether the NIC is configured to acquire an IP address automatically, for example. Figure 9-2 below illustrates how you would validate this configuration when using a Windows-based machine: Fig. 9-2. Verifying NIC Settings when Using DHCP You could also perform additional troubleshooting functions, such as assigning a static IP address, verifying whether the local host has network connectivity, for example. As is the case with NIC issues, Operating System behaviors or defects are platform independent and require different troubleshooting commands, depending on the platform. By default, all ports transition through several Spanning Tree Protocol states before they begin forwarding user data. The default amount of time before the port transitions into the forwarding state may cause some hosts to fail to receive dynamic addressing via DHCP due to NIC timeouts. The recommended solution in such cases is to enable the PortFast feature on access ports connected to hosts that will be getting addressing information via DHCP. When an IP phone is connected to the port, assuming correct Multi-VLAN Access Port (MVAP) configuration, if Cisco Discovery Protocol (CDP) is disabled on the switch port, while the host con- 371 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I nected to the phone LAN port will still be assigned an IP address by the DHCP server, the IP phone will be unable to acquire an address from the voice VLAN. If the CDP is enabled, the switch is able to detect that the Cisco IP phone requests the DHCP server and can provide the correct subnet information. The DHCP server is then able to allot an IP address from the voice VLAN/subnet pool. Keep in mind that there are no explicit configuration steps required to bind the DHCP service to the voice VLAN. When configuring Cisco IOS DHCP relay agent functionality on Catalyst switches, the switch will include the DHCP relay agent information option (option 82) in packets it sends to the DHCP server. This option allows the switch to include information about itself when forwarding clientoriginated DHCP packets to a DHCP server. While a Cisco IOS DHCP server will accept such packets, it may be incompatible with other DHCP servers. In such cases, consider disabling this behavior on the switch using the no ip dhcp snooping information option global configuration command. Finally, while rare, it is possible to run into Cisco IOS DHCP software bugs. For example, in older Cisco IOS software images (i.e., 12.1), enabling the Unicast Reverse Path Forwarding (uRPF) feature, which was described in the previous chapter, causes the router to drop packets with a source address of 0.0.0.0 and a destination address of 255.255.255.255. This software defect is resolved in current Cisco IOS images. If you have performed the necessary checks (i.e., verified the router or switch configuration, and host and server settings [if an external DHCP server is being used]) and you suspect a software bug, then you should contact the Cisco TAC for additional troubleshooting assistance and verification. In addition to the debug commands as well as other verifications described in this section, you can also use the show ip dhcp server statistics command to troubleshoot possible Cisco IOS DHCP server problems when Cisco IOS DHCP server functionality is enabled on the device. This command provides statistics on the DHCP messages described in the previous section. Following is a sample output of the message statistics that is printed by this command. R1#show ip dhcp server statistics Memory usage 40343 Address pools 1 Database agents 0 Automatic bindings 1 Manual bindings 0 Expired bindings 0 Malformed messages 0 Secure arp entries 0 Message BOOTREQUEST Received 0 372 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT DHCPDISCOVER DHCPREQUEST DHCPDECLINE DHCPRELEASE DHCPINFORM 19 2 0 0 2 Message BOOTREPLY DHCPOFFER DHCPACK DHCPNAK Sent 0 1 4 0 UNDERSTANDING NAT Network Address Translation (NAT) enables hosts on private networks to access resources on the Internet or other public networks. NAT is an IETF standard that enables a LAN to use one set of IP addresses for internal traffic, typically private address space as defined in RFC 1918, and another set of addresses for external traffic, typically publicly registered IP address space. NAT converts the packet headers for incoming and outgoing traffic and keeps track of each session. NAT offers the dual functions of security and address conservation, and is typically implemented in remote-access environments. The key to understanding and, ultimately, troubleshooting NAT problems is having a solid understanding of NAT terminology. You should be familiar with the following NAT terms: • The NAT inside interface • Inside local address • Inside global address • The NAT outside interface • Outside local address • Outside global address NOTE: NAT is a core CCNA requirement. You can find additional detailed information on NAT in the current CCNA study guide, which is available online at www.howtonetwork.net. In NAT terminology, the inside interface is the border interface of the administrative domain controlled by the organization. This does not necessarily have to be the default gateway used by hosts that reside within the internal network. The inside local address is the IP address of a host residing on the inside network. In most cases, the inside local address is an RFC 1918 address. This address is translated to the outside global address, which is typically an IP address from a publically assigned or registered pool. It is important to remember, however, that the inside local address could also be a public address. 373 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The inside global address is the IP address of an internal host as it appears to the outside world. Once the inside IP address has been translated, it will appear as an inside global address to the Internet public or any other external network or host. The outside interface is the boundary for the administrative domain that is not controlled by the organization. In other words, the outside interface is connected to the external network, which may be the Internet or any other external network, such as a partner network, for example. Any hosts residing beyond the outside interface fall outside the local organization’s administration. The outside local address is the IP address of an outside, or external, host as it appears to inside hosts. Finally, the outside global address is an address that is legal and can be used on the Internet. Both outside local addresses and outside global addresses are typically allocated from a globally routable address or network space. To clarify these concepts further, Figure 9-3 below shows the use of the addresses in a session between two hosts. NAT is enabled on the intermediate gateway: SRC Address DST Address SRC Address DST Address Inside Local Outside Local Inside Global Outside Global Flow of Traffic Host 1 INSIDE Host 2 OUTSIDE Fig. 9-3. Understanding NAT Inside and Outside Addresses Configuring and Verifying NAT in Cisco IOS So ware As you already know, the configuration and verification of Network Address Translation in Cisco IOS software is a straightforward task. When configuring NAT, perform the following: • Designate one or more interfaces as the internal (inside) interface(s) using the ip nat inside interface configuration command. • Designate an interface as the external (outside) interface using the ip nat outside interface configuration command. • Configure an Access Control List that will match all traffic that is to be NATed. This can be a standard or an extended named or numbered ACL. 374 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT • Optionally, configure a pool of global addresses using the ip nat pool <name> <startip> <end-ip> [netmask <mask> | prefix-length <length>] global configuration command. This defines a pool of inside global addresses to which inside local addresses will be translated. • Configure NAT globally using the ip nat inside source list <ACL> [interface|pool] <name> [overload] global configuration command. The following example shows how to configure basic NAT in Cisco IOS software: R1(config)#interface FastEthernet0/0 R1(config-if)#description ‘Connected To The Internal LAN’ R1(config-if)#ip address 10.5.5.1 255.255.255.248 R1(config-if)#ip nat inside R1(config-if)#exit R1(config)#interface Serial0/0 R1(config-if)#description ‘Connected To The ISP’ R1(config-if)#ip address 150.1.1.1 255.255.255.248 R1(config-if)#ip nat outside R1(config-if)#exit R1(config)#access-list 100 remark ‘Translate Internal Addresses Only’ R1(config)#access-list 100 permit ip 10.5.5.0 0.0.0.7 any R1(config)#ip nat pool INSIDE-POOL 150.1.1.3 150.1.1.6 prefix-length 24 R1(config)#ip nat inside source list 100 pool INSIDE-POOL R1(config)#exit Following this configuration, the show ip nat translations command can be used to verify that translations are actually taking place on the router as illustrated below: R1#show ip nat translations Pro Inside global Inside local icmp 150.1.1.4:4 10.5.5.1:4 icmp 150.1.1.3:1 10.5.5.2:1 tcp 150.1.1.5:15594 10.5.5.3:15594 Outside local 200.1.1.1:4 200.1.1.1:1 200.1.1.1:23 Outside global 200.1.1.1:4 200.1.1.1:1 200.1.1.1:23 TROUBLESHOOTING NAT Before we delve into NAT troubleshooting, it is important to understand that while NAT provides the advantage of allowing private networks to communicate, it also has numerous limitations, which include the following: • Breaking the end-to-end IP model • The need to maintain Connection state issues 375 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • The inhabitation of end-to-end security • Applications that are not NAT-friendly • Address space collision • Ratio of internal and reachable IP addresses IP (in general) was designed so that only network endpoints (i.e., hosts and servers) handle the connection. However, when NAT is implemented, one or more intermediate NAT-enabled devices must terminate and re-originate each session that is translated. This breaks the end-to-end model of IP and may cause issues with some applications and protocols. When NAT is used, NAT-enabled devices must maintain the connection state for each of the translations being performed. Depending on the number of translations, this can consume a significant amount of device resources, such as memory, and result in poor performance. In addition, because all packets must be processed, NAT can hinder network and device performance by introducing latency due to device processing delays. This results in longer round-trip times between source and destination hosts, which could cause severe performance problems for real-time applications such as voice and video. NAT can also cause security headaches in identifying the sources of network breaches when traffic is coming from a NATed location. By hiding the true identity of the owner, NAT can make it very difficult to identify the true origin or address of an attacker. Another inhibition to end-to-end security is the incompatibility of NAT with cryptography and encryption. When some security algorithms, such as IP Security (IPSec), are employed, they cannot be used in conjunction with NAT because NAT changes the source address of packets before they are forwarded to their destination. This change causes the cryptography method to fail because it thinks that the packet has been tampered with along the way. Not all applications work in the same manner. This means that some applications, for example, applications that use proprietary protocols, etc., are not compatible with NAT. The two most common issues experienced with NAT are on applications that use embedded IP addresses or port numbers. When NAT changes these port numbers or IP addresses, as is the case with IPSec, for example, the applications do not function as expected or, in some instances, cease functioning completely. This means that while NAT can be used to provide privately addressed devices access to public networks, it cannot be used in all instances. Address space collision typically occurs when two or more organizations merge. Because RFC 1918 address space is commonly used internally in different organizations, with NAT enabling these privately addressed networks to communicate with the outside world, it is not uncommon for two orga- 376 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT nizations that merge to find out that they have the same address space. If re-IP addressing the network is not an option, and it usually is not because of the complexity involved, then ‘double-NAT’ is typically used to allow organizations with overlapping address space to communicate with each other. However, the downside to this is the additional complexity that is introduced into the environment. Finally, NAT works well when there are a few internal hosts that must be accessed from external networks, such as the Internet. However, NAT can become an issue when multiple hosts need to be accessed from the Internet. Given the greatly depleted IPv4 address space, acquiring a large number of public addresses is not always possible. Because this is not always possible, it highlights another limitation of IPv4. As is the case with DHCP troubleshooting, the most common sources of problems when using NAT are due to device misconfigurations. Common misconfigurations include the following: • Misconfigured NAT Access Control Lists • Asymmetric routing • Misconfigured NAT address pool • Router interfaces missing NAT commands Misconfigured NAT ACLs are a common cause of NAT problems. You can use the show ip nat statistics command to determine the ACL that is used for translations, followed by the show ip access-lists command to verify that the ACL is configured correctly. Following is a sample output of the information printed by the show ip nat statistics command (highlighting the ACL used for NAT): R1#show ip nat statistics Total active translations: 2 (0 static, 2 dynamic; 0 extended) Outside interfaces: Serial0/0 Inside interfaces: FastEthernet0/0 Hits: 182 Misses: 2 CEF Translated packets: 83, CEF Punted packets: 4 Expired translations: 42 Dynamic mappings: -- Inside Source [Id: 1] access-list 100 pool INSIDE-POOL refcount 2 pool INSIDE-POOL: netmask 255.255.255.0 start 150.1.1.3 end 150.1.1.6 type generic, total addresses 4, allocated 2 (50%), misses 0 Appl doors: 0 Normal doors: 0 Queued Packets: 0 377 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Asymmetric routing can also cause NAT problems, wherein some protocols work and others do not. Consider the topology in Figure 9-4 below, for example: Lo0: 150.1.1.254 Fa0/0: 192.1.1.254 R1 Fa0/0: 172.1.1.254 Router IP Routing Table Fa0/0: 192.1.1.1 Fa0/0: 172.1.1.1 200.1.1.0/24 via 192.1.1.1 Router IP Routing Table 192.1.1.0/24 via 200.1.1.2 172.1.1.0/24 via 201.1.1.2 150.1.1.0/24 via 201.1.1.2 Se0/0: 200.1.1.2 Se0/1: 201.1.1.1/24 Se0/0: 200.1.1.1/24 Static NAT Xlation 200.1.1.5 > 10.1.1.5 Se0/1: 201.1.1.2 R6 Fa0/0: 10.1.1.1/24 Fa0/0: 10.1.1.5/24 R7 Fig. 9-4. Asymmetric Routing and NAT Referencing Figure 9-4, NAT is enabled on R6. The inside interface (FastEthernet0/0) has been assigned the IP address 10.1.1.1/24. The outside interface (Serial0/0) has been assigned the IP address 200.1.1.1/24. NAT is not enabled on the Serial0/1 interface. The routing table of R6 is illustrated in the diagram. This router uses the path via neighbor 200.1.1.2 to reach the 192.1.1.0/24 subnet. However, R6 prefers the path via neighbor 201.1.1.2 to reach the 172.1.1.0/24 and 150.1.1.0/24 subnets. R1 uses the path via neighbor 192.1.1.1 to reach the 200.1.1.0/24 subnet. On R6, NAT has been configured to translate the address 200.1.1.5 to internal address 10.1.1.5, which is the address assigned to the FastEthernet0/0 interface of R7. Next, R1 initiates a Telnet session to 200.1.1.5 (R7). This session is sourced from the Loopback 0 interface of R1 (150.1.1.254). The TCP SYN packet is sent via neighbor 192.1.1.1 and arrives at the Serial0/0 interface of R6. R6 translates the packet, which has a destination address of 200.1.1.5, to 378 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT internal address 10.1.1.5 and forwards the packet out of its LAN interface to R7. R7 sends a SYN ACK response to 150.1.1.254. This packet hits R6, which sends it out of Serial0/1, based on the routing table entries. Because NAT is not enabled on Serial0/1, the source address of 10.1.1.5 is not changed. The packet is received by R1 with a source address of 10.1.1.5 and a destination address of 150.1.1.254. R1 is not aware of any TCP sessions to this source address and issues an RST to 200.1.1.5. The TCP session is terminated before it establishes. While this has been described as a NAT issue, the actual recommended solution would be to correct the asymmetric routing solution because NAT is operating in the manner it should be. Once the asymmetric routing issue has been resolved, R1 will be able to establish a Telnet session to R7. Another common misconfiguration is an inadequate number of addresses in the NAT pool. When configuring NAT, there must be a one-to-one correlation between the public pool addresses and internal host addresses. If this is not the case, some host addresses will not be translated, as the public pool will run out of addresses. In situations where the number of internal addresses exceeds the number of pool addresses, you should consider implementing Port Address Translation (PAT) instead. PAT allows multiple private IP addresses to a single public IP address by using different ports. This functionality is enabled by appending the overload keyword to the ip nat inside source list <ACL> [interface|pool] <name> [overload] global configuration command in Cisco IOS software. Finally, another common misconfiguration that also causes NAT problems is incorrectly designating interfaces as inside or outside, or even forgetting to designate the correct device interfaces as inside or outside. This is typically an issue on devices with multiple interfaces. Verify that the correct NAT configuration has been implemented by parsing the device configuration, or by using the show ip nat statistics command, as illustrated below: R1#show ip nat statistics Total active translations: 2 (0 static, 2 dynamic; 0 extended) Outside interfaces: Serial0/0 Inside interfaces: FastEthernet0/0 Hits: 182 Misses: 2 CEF Translated packets: 83, CEF Punted packets: 4 Expired translations: 42 Dynamic mappings: -- Inside Source [Id: 1] access-list 100 pool INSIDE-POOL refcount 2 pool INSIDE-POOL: netmask 255.255.255.0 start 150.1.1.3 end 150.1.1.6 379 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I type generic, total addresses 4, allocated 2 (50%), misses 0 Appl doors: 0 Normal doors: 0 Queued Packets: 0 In addition to device misconfigurations, the following problems can also result in NAT issues: • Layer 1 and Layer 2 issues • Layer 3 issues • Device resource depletion • Incompatible applications Layer 1 and Layer 2 issues can cause issues such as intermittent connectivity, even when NAT is configured correctly on the device. Verify that everything at these layers is operating correctly using the appropriate commands. As was described earlier in this section, Layer 3 issues, such as asymmetric routing, can and often do cause problems when NAT is implemented. In environments where multiple paths between source and destination exist, verify that routing is symmetric; otherwise, NAT will not function as expected. In addition to routing, also check for traffic filtering at Layer 3. This, too, can break NAT. Verify both local and transient device filtering using the appropriate suite of commands. In some cases, the size of the NAT table can significantly increase, which consumes many resources (e.g., memory) available on the device. When this happens, the %NAT: System busy. Try later error message is printed on the console when a show command related to NAT or a show running-config or write memory command is executed. You can avoid this issue and protect the device using the ip nat translation max-entries <number> global configuration command, which specifies the maximum number of NAT entries that are permitted in the NAT table of the router or switch. Finally, as was stated earlier in this chapter, not all applications are NAT-friendly. In some cases, enabling NAT will break some applications and protocols (e.g., IPSec). It is important to understand the applications and protocols used in the network prior to implementing NAT; otherwise, you could spend endless hours troubleshooting something that will never work with Network Address Translation in the first place. In addition to IPSec, additional protocols that can also be affected by NAT, because they use embedded IP addresses, include FTP, Internet Relay Chat (IRC), Simple Network Management Protocol (SNMP), H.323, Lightweight Directory Access Protocol (LDAP), and Session Initiation Protocol (SIP). 380 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT NOTE: Cisco IOS software supports a feature called NAT Transparency, which addresses the vast majority of incompatibilities between NAT and IPSec. However, it should be noted that it does not resolve all possible incompatibilities. NAT Transparency is beyond the scope of the TSHOOT certification exam and will not be described in any additional detail in this chapter or in this guide. In conclusion, given that NAT is a resource-hungry application, you should consider using the show commands described in this chapter when troubleshooting NAT. This includes using the show running-config command to verify device configurations, as well as the show ip nat translation and show ip nat statistics commands to verify NAT operation. However, there may be situations where you have used these commands and still cannot resolve the NAT issue. In such cases, you can (with caution) use debugging commands to aid your troubleshooting efforts. Cisco IOS software supports the following options when debugging Network Address Translation. It should be noted that the options displayed below are based on Cisco IOS software version 12.4. Options will differ with other versions: R1#debug ip <1-99> detailed fragment generic h323 ipsec nvi port pptp route sip skinny vrf wlan-nat <cr> nat ? Access list NAT detailed events NAT fragment events NAT generic ALG handler events NAT H.323 events NAT IPSec events NVI events NAT PORT events NAT PPTP events NAT Static route events NAT SIP events NAT skinny events NAT VRF events WLAN NAT events It should be noted that the majority of these options are beyond the scope of the TSHOOT certification exam. Commonly used debug commands include the debug ip nat <ACL> command, which restricts output to the address(es) permitted in the specified standard ACL; the debug ip nat command, which displays information about each packet the device translates; and the debug ip nat detailed command, which prints detailed information about each translated packet, in- cluding additional information such as protocol type and port numbers. The following is a sample output of the detailed information that is printed by the debug ip nat detailed command: R1#debug ip nat detailed IP NAT detailed debugging is on R1# 381 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R1# *Oct *Oct *Oct *Oct *Oct *Oct *Oct 21 21 21 21 21 21 21 13:20:27.498: 13:20:27.498: 13:20:27.498: 13:20:27.498: 13:20:27.498: 13:20:28.504: 13:20:28.508: NAT*: NAT*: NAT*: NAT*: NAT*: NAT*: NAT*: i: icmp (10.5.5.2, 1) -> (200.1.1.1, 1) [12314] i: icmp (10.5.5.2, 1) -> (200.1.1.1, 1) [12314] s=10.5.5.2->150.1.1.3, d=200.1.1.1 [12314] o: icmp (200.1.1.1, 1) -> (150.1.1.3, 1) [12314] s=200.1.1.1, d=150.1.1.3->10.5.5.2 [12314] i: icmp (10.5.5.2, 1) -> (200.1.1.1, 1) [12327] s=10.5.5.2->150.1.1.3, d=200.1.1.1 [12327] ... [Truncated Output] CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. Understanding DHCP • DHCP is used to dynamically assign hosts with IP addressing information • DHCP can provide IP address, mask, default gateway, and DNS servers • DHCP uses UDP port 68 • Cisco IOS routers and some switches can be configured as DHCP clients and servers • DHCP is a client / server protocol • DHCP clients transition through a series of states upon initialization • DHCP clients transition through the following states: 1. Initializing 2. Selecting 3. Requesting 4. Bound 5. Renewing 6. Rebinding • DHCP clients send out a DHCPDISCOVER message during the initializing phase • DHCP servers respond with a DHCPOFFER message • DHCP clients transition to the selecting state after the DHCPOFFER is received • DHCP clients then send a DHCPREQUEST and transition to the requesting phase • DHCP servers respond with a DHCPACK message • After the DHCPACK is received, the client transitions to the bound state • Additional DHCP messages that can be sent by the client or server include the following: 1. DHCPNAK or DHCPNACK 2. DHCPDECLINE 382 C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT 3. DHCPINFORM 4. DHCPRELEASE • Cisco IOS devices can be configured as DHCP / BOOTP Relay Agents • DHCP / BOOTP Relay Agents forward DHCP messages to servers • Messages between Relay Agents and servers are Unicast between the two • By default, the ip helper-address will forward the following UDP Broadcasts: 1. Trivial File Transfer Protocol (TFTP) (port 69) 2. DNS (port 53), time service (port 37) 3. NetBIOS name server (port 137) 4. NetBIOS datagram server (port 138) 5. Boot Protocol (DHCP/BOOTP) client and server datagrams (ports 67 and 68) 6. Terminal Access Control Access Control System (TACACS) service (port 49) 7. IEN-116 name service (port 42) Troubleshoo ng DHCP • The most common cause of problems with DHCP is device misconfigurations • Additional issues that can cause DHCP problems include the following: 1. NIC compatibility issue or DHCP feature issue 2. Faulty NIC or improper NIC driver installation 3. Operating System behavior or defect 4. Spanning Tree issues 5. CDP Is Disabled On Ports Connected to IP Phones 6. DHCP Server DHCP Relay Information Option Incompatibility 7. Cisco IOS DHCP/BOOTP Software Bugs Understanding NAT • NAT enables hosts on private networks to access resources on the Internet / public networks • NAT is an IETF standard • NAT converts packet headers for incoming /outgoing traffic and keeps track of each session • The inside interface is the border interface of the organization administrative domain • The inside local address is the IP address of a host residing on the inside network • The inside global address is the IP address of an internal host as it appears to the outside • The outside interface is the boundary for the administrative domain • The outside local address is the address of an outside host as it appears to inside hosts • The outside global address is an address that is legal and can be used on the Internet 383 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Troubleshoo ng NAT • NAT has several limitations, which include the following: 1. Breaking the end-to-end IP model 2. The need to maintain connection state issues 3. The inhabitation of end-to-end security 4. Applications that are not NAT-friendly 5. Address space collision 6. Ratio of internal and reachable IP addresses • Most NAT problems can be attributed to general device misconfigurations • Common misconfigurations include the following: 1. Misconfigured NAT Access Control Lists 2. Asymmetric Routing 3. Misconfigured NAT Address Pool 4. Router Interfaces Missing NAT Commands • In addition to misconfigurations, the following problems can also result in NAT issues: 1. Layer 1 and Layer 2 Issues 2. Layer 3 Issues 3. Device Resource Depletion 4. Incompatible Applications 384 CHAPTER 10 Troubleshoo ng IPv6 Rou ng & Interoperability C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I U p until this point, all practical troubleshooting topics covered in this guide have been based on the current IPv4 standard. In this chapter, we will look at the same basic principles; however, the routed protocol in this case will be IP version 6 (IPv6). The TSHOOT certification exam objectives that are covered in this chapter are as follows: • Troubleshoot IPv6 routing • Troubleshoot IPv6 and IPv4 interoperability The TSHOOT certification exam requires that you not only have a solid understanding of IPv6 routing, routing protocols, and IPv6 integration with IPv4 networks but also understand how to support IPv6 internetworks. This includes troubleshooting IPv6 routing, routing protocols, and the mechanisms used to integrate IPv6 and IPv4 networks. For the most part, IPv6 routing and routing protocol troubleshooting can be performed using the methods discussed for IPv4 routing protocols. Naturally, however, there are some differences between IPv6 and IPv4 routing protocols, with the majority of those being how they are implemented in the CLI. For this reason, this chapter will first discuss basic IPv6 routing and routing protocol configuration, as well as IPv4 and IPv6 integration mechanisms, prior to discussing how to troubleshoot IPv6 internetwork problems. This chapter will be divided into the following sections: • IP version 6 Protocol Overview and Fundamentals • Understanding and Troubleshooting EIGRPv6 • Understanding and Troubleshooting RIPng • Understanding and Troubleshooting OSPFv3 • Troubleshooting IPv6 Route Redistribution • IPv4 and IPv6 Interoperability • Troubleshooting IPv4 and IPv6 Interoperability NOTE: It should be noted that this chapter will cover basic IPv6 fundamentals and principles. Additional detailed information on IPv6 can be found in the ROUTE study guide, which is currently available online. Please refer to that guide for additional information. 386 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y IP VERSION 6 PROTOCOL OVERVIEW AND FUNDAMENTALS Version 6 of the Internet Protocol provides additional capabilities over the current version 4. In many ways, version 6 provides several enhancements over the current standard. These include the following: • The simplified IPv6 packet header • Larger address space • IPv6 addressing hierarchy • IPv6 extensibility • IPv6 Broadcast elimination • Stateless autoconfiguration • Integrated mobility • Integrated enhanced security The header fields in an IPv4 packet are very detailed and complete. However, not all fields in the IPv4 packet header are used or required – for example, the Type of Service field. Other headers, such as the Checksum, are no longer a necessity, nor are they used because transmission link quality has greatly improved over the years. In contrast, the header fields of an IPv6 packet are much simpler and contain the bare minimum of information required to route the packet. This allows for greater routing efficiency with IPv6 than is afforded by IPv4. IPv6 addresses are 128 bits in length. This extended address length allows for billions of host addresses. This sheer amount of address space eliminates the need to perform Network Address Translation (NAT) in IPv6 because a global address can be assigned to each individual host. Because a global IP address can be assigned to each individual device (e.g., computers, laptops, and phones), the Internet reverts to a true end-to-end model when using IPv6. Because of the much larger address space provided by IPv6, multiple levels of hierarchy can be used within the IPv6 address space. This allows providers and other organizations to use this hierarchy to better manage the IPv6 address space based on bit-boundaries. The use of an addressing hierarchy allows route summarization in IPv6 to be performed in a more organized manner than is currently performed using the IPv4 address space. Unlike IPv4, IPv6 has a fixed-size header field and additional header extensions are included to support new features. These additional headers are outside the standard IPv6 header and are referenced in such a way that all individual internetwork devices can skip the extension if they do not support it. This reduces the processing overhead of routers routing IPv6 packets. 387 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In IPv6, Address Resolution Protocol (ARP) Broadcasts are replaced by Multicast packets on the local network segment. This prevents devices that do not need to receive these packets from receiving them, and avoids the problems that Broadcasts cause (e.g., wasting resources and network performance degradation). Both IPv4 and IPv6 support stateful autoconfiguration, which allows network hosts to receive their addressing information from a network server (i.e., via DHCP). In addition to supporting stateful autoconfiguration, IPv6 also supports stateless autoconfiguration. Stateless autoconfiguration allows hosts to configure their Unicast IPv6 addresses by themselves based on prefix advertisements from routers on the local network segment. While Mobile IP is available for both IPv4 and IPv6, it is built into IPv6, whereas it is an added function in IPv4. IPv6 mobility allows IPv6-capable devices such as PDAs, cell phones, and wireless laptops to roam between the IPv6 networks of wireless or cellular providers by using the Mobile IP protocol. This allows any IPv6 host to use Mobile IP as needed, while only IPv4 hosts that have this added functionality can use Mobile IP. IPv6 uses the inbuilt security mechanisms afforded by the IP Security (IPSec) protocol. The key difference between IPSec in IPv4 and IPv6 is that it is optional in IPv4 but is mandatory in IPv6. As defined in RFC 2460, IPv6 includes the use of the Authentication Header (AH) and Encapsulating Security Payload (ESP) extension headers in a complete implementation. IPv6 Address Representa on The three ways in which IPv6 addresses can be represented are as follows: 1. The preferred or complete address representation or form 2. The compressed representation 3. IPv6 addresses with an embedded IPv4 address While the preferred form or representation is the most commonly used method for representing the 128-bit IPv6 address in text format, it is also important to be familiar with the other two methods of IPv6 address representation. These methods are described in the following sections. The preferred representation for an IPv6 address is the longest format, and is also referred to as the complete form of an IPv6 address. This format represents all 32 Hexadecimal characters that are used to form an IPv6 address. This is performed by writing the address as a series of eight 16-bit Hexadecimal fields, separated by a colon (e.g., 3FFF:1234:ABCD:5678:020C:CEFF:FEA7:F3A0). 388 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y The compressed representation allows IPv6 addresses to be compressed in one of two ways. The first method allows a double colon (::) to be used to compress consecutive zero values in a valid IPv6 address for successive 16-bit fields comprised of zeros or for leading zeros in the IPv6 address. When using this method, it is important to remember that the double colon can be used only once in an IPv6 address. An example of a compressed IPv6 address would be 3FFE::1/64. The third representation of an IPv6 address is to use an embedded IPv4 address within the IPv6 address. When an IPv6 address is embedded with an IPv4 address, the first part of the IPv6 address uses the Hexadecimal notation and the remainder of the address is in the traditional dotteddecimal notation used by IPv4 addresses. However, it is permissible to convert the 32-bit dotteddecimal IPv4 address into Hexadecimal notation and embed that into the IPv6 address instead. The IPv6 address with an embedded IPv4 address is comprised of six fields of 16-bit Hexadecimal characters and four fields of 8-bit decimal characters. The two kinds of IPv6 addresses that contain an embedded IPv4 address are as follows: 1. IPv4-compatible IPv6 addresses 2. IPv4-mapped IPv6 addresses IPv4-compatible IPv6 addresses have the first 96 bits set to 0 and are then followed by the 32-bit IPv4 address. An example of an IPv4-compatible IPv6 address would be 0000:0000:0000:0000:00 00:0000:172.16.255.1. This same address can then be compressed as 0:0:0:0:0:0:172.16.255.1/128, or simply as ::172.16.255.1/128. Additionally, it is important to remember that the decimal IPv4 address could be converted to Hexadecimal notation and used to create the IPv4-compatible IPv6 address 0:0:0:0:0:0: AC10:FF01/128, or simply ::AC10:FF01/128. IPv4-mapped IPv6 addresses have the first 80 bits set to 0, the next 16 bits set to a value of all 1s (which is FFFF in Hexadecimal notation), and are then followed by the IPv4 dotted-decimal address. An example of an IPv4-mapped IPv6 address would be 0000:0000:0000:0000:0000:FFFF:172. 16.255.1/128. Because it is perfectly legal to represent the address in the compressed form, the same address could also be written as either 0:0:0:0:0:FFFF:172.16.255.1/128 or ::FFFF:172.16.255.1/128. Additionally, the IPv4 address also could be converted to Hexadecimal notation, producing the IPv4-mapped IPv6 address 0:0:0:0:0: FFFF:AC10:FF01/128 or simply ::FFFF:AC10:FF0/128. The Different IPv6 Address Types Unlike IPv4, IPv6 does not use Broadcast addresses. Instead, IPv6 supports and uses only the Unicast, Multicast, and Anycast address classes or types also used by IPv4. IPv6 addresses can be classified as any one of the following: 389 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Link-Local addresses • Site-Local addresses • Aggregate global Unicast addresses • Multicast addresses • Anycast addresses • Loopback addresses • Unspecified addresses IPv6 Link-Local addresses can be used only on the local link (i.e., a shared segment between devices) and are automatically assigned to each interface when IPv6 is enabled on that interface. These addresses are assigned from the Link-Local prefix FE80::/10. To complete the address, bits 11 through 64 are set to 0 and the interface Extended Unique Identifier 64 (EUI-64) is appended to the Link-Local address as the low-order 64 bits. The EUI-64 is comprised of the 24-bit manufacturer ID assigned by the IEEE and the 40-bit value assigned by that manufacturer. Site-Local addresses are Unicast addresses that are used only within a site. Unlike Link-Local addresses, Site-Local addresses must be configured manually on network devices. These addresses are the IPv6 equivalent of the private IPv4 address space defined in RFC 1918 and can be used by organizations that do not have globally routable IPv6 address space. While still supported in Cisco IOS software, Site-Local addresses are deprecated by RFC 4193, which describes Unique-Local Addresses (ULAs), and serve the same function as Site-Local addresses, so are not routable on the global IPv6 Internet. Unique-Local Addresses are assigned from the FC00::/7 IPv6 address block, which is then also further divided into two /8 address groups, referred to as the assigned and random groups. These two groups are the FC00::/8 and FD00::/8 IPv6 address blocks. The FC00::/8 block is managed by an allocation authority for /48 blocks in use, while the FD00::/8 is formed by appending a randomly generated 40-bit string to derive a valid /48 block. Aggregate global Unicast addresses are the IPv6 addresses used for generic IPv6 traffic, as well as for the IPv6 Internet. These are similar to the public addresses used in IPv4. From a network addressing point of view, each IPv6 global Unicast address is comprised of three main sections: the prefix received from the provider (48-bit in length), the site prefix (16-bit in length), and the host portion (64-bit in length). This makes the 128-bit address used in IPv6. Aggregate global Unicast addresses for IPv6 are assigned by IANA and fall within the IPv6 prefix 2000::/3. This allows for a range of aggregate global Unicast addresses, from 2000 to 3FFF. The Multicast addresses used in IPv6 are derived from the FF00::/8 IPv6 prefix. In IPv6, Multicast operates in a different manner than that of Multicast in IPv4. There are two defined types of IPv6 390 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y Multicast addresses: permanent and temporary. Permanent IPv6 Multicast addresses are assigned by IANA, while the temporary IPv6 Multicast addresses can be used in pre-deployment Multicast testing. In IPv6, Anycast addresses use global Unicast, Site-Local, or even Link-Local addresses. However, there is an Anycast address reserved for special use. This special address is referred to as the Subnet-Router Anycast Address. The Subnet-Router Anycast Address is formed with the subnet’s 64-bit Unicast prefix, with the remaining 64 bits set to zero, for example: 2001:1a2b:1111:d 7e5:0000:0000:000:0000. IPv6 Loopback addresses can be represented as 0000:0000:0000:0000:0000:0000:0000:0001 in the preferred address format and can use the prefix ::1. This means that in Loopback addresses, all bits are set to 0 except for the last bit, which is always set to 1. These addresses are always assigned automatically when IPv6 is enabled on a device and therefore can never be changed. In IPv6 addressing, unspecified addresses are simply Unicast addresses that are not assigned to any interface. These addresses indicate the absence of an IPv6 address and are used for special purposes that include IPv6 Dynamic Host Configuration Protocol (DHCP) and Directory Access Diagnostics (DAD). Unspecified addresses are represented by all 0 values in the IPv6 address and can be written using the :: prefix. In the preferred format, these addresses are represented as 0000:0000:0 000:0000:0000:0000:0000:0000. Verifying and Troubleshoo ng Generic IPv6 Rou ng As is the case with IPv4, Cisco IOS software provides several IPv6-specific commands that can be used to verify and troubleshoot generic IPv6 routing configurations and problems. When running IPv4, the show ip route command, along with supported keywords, is used to view the contents of the Routing Information Base (RIB). When using IPv6, the same is performed using the show ipv6 route command. This command supports the following keywords (in Cisco IOS 12.4): R1#show ipv6 route ? Hostname or X:X:X:X::X X:X:X:X::X/<0-128> bgp connected interface isis local ospf rip static summary | <cr> IPv6 name or address IPv6 prefix BGP routes Connected routes interface-specific routes IS-IS routes Local routes OSPFv3 routes RIPng routes Static routes Summary display Output modifiers 391 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I While the majority of these supported options require no explanation, as they are described and illustrated in the ROUTE study guide, there are some options that were not described with which you should be familiar. The interface keyword can be used to view all routes learned via a particular interface. This command is protocol independent and prints all know routes received via the specified interface, regardless of routing protocol. The output of this command also includes static routes pointed out of the specified interface, as well as the connected subnets for that specified interface. Following is a sample of the information printed by this command: R1#show ipv6 route interface FastEthernet0/0 IPv6 Routing Table - 7 entries Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP U - Per-user Static route I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2 ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2 S ::/0 [1/0] via FE80::20D:28FF:FE9E:F940, FastEthernet0/0 OI 2001::2/128 [110/1] via FE80::20D:28FF:FE9E:F940, FastEthernet0/0 R 2002::2/128 [120/2] via FE80::20D:28FF:FE9E:F940, FastEthernet0/0 C 3FFE::/64 [0/0] via ::, FastEthernet0/0 L 3FFE::20F:23FF:FE5E:EC80/128 [0/0] via ::, FastEthernet0/0 The show ipv6 route local command prints information on all locally configured aggregate global Unicast addresses and subnets; however, keep in mind that this does not include assigned Link-Local addresses. You can view assigned Link-Local addresses using the show ipv6 interface [brief] command instead. Following is a sample of the information that is printed by the show ipv6 route local command: R1#show ipv6 route local IPv6 Routing Table - 7 entries Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP U - Per-user Static route I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2 ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2 LC 2001::1/128 [0/0] via ::, Loopback0 LC 2002::1/128 [0/0] via ::, Loopback1 LC 2003::1/128 [0/0] via ::, Loopback3 L 3FFE::20D:28FF:FE9E:F940/128 [0/0] via ::, FastEthernet0/0 392 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y L L FE80::/10 [0/0] via ::, Null0 FF00::/8 [0/0] via ::, Null0 Finally, the show ipv6 route summary command prints a summary of the number of IPv6 routes known by the local device. This command shows the total number of routing entries, and further breaks this down by routing protocol (route source) and the number of prefixes. Following is a sample output of the information that is printed by this command: R1#show ipv6 route summary IPv6 Routing Table Summary - 7 entries 3 local, 1 connected, 1 static, 1 RIP, 0 BGP, 0 IS-IS, 1 OSPF Number of prefixes: /0: 1, /8: 1, /10: 1, /64: 1, /128: 3 In addition to basic show commands, you should also be familiar with the debugging suite of commands for generic IPv6 troubleshooting. Cisco IOS software provides a plethora of commands that can be used to troubleshoot IPv6 problems. IPv6 debugging is enabled using the debug ipv6 privileged EXEC command. This command supports the following options: R1#debug ipv6 ? access-list cef cpc dhcp icmp inspect interface mfib mld mobile mrib nat nd ospf packet pim policy pool port-mapping rip routing virtual-reassembly IPv6 access list debugging IPv6 CEF information IPv6 Common Parsing Cache debugging IPv6 DHCP debugging ICMPv6 debugging Stateful inspection events IPv6 interface debugging IP Multicast forwarding information base Multicast Listener Discovery MIPv6 Debugging Multicast Route DB NAT-PT events IPv6 Neighbor Discovery debugging OSPF information IPv6 packet debugging Protocol Independent Multicast IPv6 policy-based routing debugging IPv6 prefix pool debugging IPv6 PAM events RIP Routing Protocol debugging IPv6 routing table debugging IPv6 Virtual Fragment Reassembly (VFR) debugging NOTE: While the majority of these options are beyond the scope of the TSHOOT certification exam, the following section describes some of the options with which you should be familiar. 393 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The debug ipv6 packet <access-list|detail> command enables IPv6 debugging for IPv6 packets. As is the case with the debug ip packet command used when troubleshooting IPv4, you can restrict the output by specifying an ACL, or you can specify that detailed information be included, by appending the detail keyword. The following example illustrates how to configure an IPv6 ACL that permits only ICMPv6 packets and enable detailed IPv6 packet debugging, restricting the output to only ICMPv6 packets: R1(config)#ipv6 access-list TSHOOT-ICMP-ACL R1(config-ipv6-acl)#permit icmp any any R1(config-ipv6-acl)#exit R1(config)#exit R1# R1#debug ipv6 packet access-list TSHOOT-ICMP-ACL detail IPv6 unicast packet debugging is on (detailed) for access list TSHOOT-ICMP-ACL R1# *Mar 23 06:41:23.283: IPv6: Sending on FastEthernet0/0 *Mar 23 06:41:27.091: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.091: dest 3FFE::20F:23FF:FE5E:EC80 *Mar 23 06:41:27.091: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, forward to ulp *Mar 23 06:41:27.095: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local) *Mar 23 06:41:27.095: dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.095: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, originating *Mar 23 06:41:27.095: IPv6: Sending on FastEthernet0/0 *Mar 23 06:41:27.099: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.099: dest 3FFE::20F:23FF:FE5E:EC80 *Mar 23 06:41:27.099: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, forward to ulp *Mar 23 06:41:27.099: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local) *Mar 23 06:41:27.099: dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.099: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, originating *Mar 23 06:41:27.103: IPv6: Sending on FastEthernet0/0 *Mar 23 06:41:27.103: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.103: dest 3FFE::20F:23FF:FE5E:EC80 *Mar 23 06:41:27.107: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, forward to ulp *Mar 23 06:41:27.107: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local) *Mar 23 06:41:27.107: dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.107: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, originating *Mar 23 06:41:27.107: IPv6: Sending on FastEthernet0/0 *Mar 23 06:41:27.111: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.111: dest 3FFE::20F:23FF:FE5E:EC80 *Mar 23 06:41:27.111: traffic class 0, flow 0x0, len 100+14, prot 58, hops 64, forward to ulp *Mar 23 06:41:27.111: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local) *Mar 23 06:41:27.111: dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) *Mar 23 06:41:27.111: traffic class 0, flow 0x0, len 100+14, prot 58, hops 394 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y 64, originating *Mar 23 06:41:27.115: IPv6: *Mar 23 06:41:27.115: IPV6: *Mar 23 06:41:27.115: *Mar 23 06:41:27.119: 64, forward to ulp *Mar 23 06:41:27.119: IPV6: *Mar 23 06:41:27.119: *Mar 23 06:41:27.119: 64, originating *Mar 23 06:41:27.119: IPv6: *Mar 23 06:41:28.671: IPv6: *Mar 23 06:41:32.095: IPV6: *Mar 23 06:41:32.095: *Mar 23 06:41:32.095: 255, originating *Mar 23 06:41:32.095: IPv6: *Mar 23 06:41:32.095: IPV6: *Mar 23 06:41:32.099: *Mar 23 06:41:32.099: hops 255, forward to ulp *Mar 23 06:41:33.283: IPv6: *Mar 23 06:41:37.095: IPV6: *Mar 23 06:41:37.095: *Mar 23 06:41:37.095: hops 255, forward to ulp *Mar 23 06:41:37.095: IPV6: *Mar 23 06:41:37.095: *Mar 23 06:41:37.095: hops 255, originating *Mar 23 06:41:37.099: IPv6: *Mar 23 06:41:42.099: IPV6: *Mar 23 06:41:42.099: *Mar 23 06:41:42.099: 255, originating *Mar 23 06:41:42.099: IPv6: *Mar 23 06:41:42.099: IPV6: *Mar 23 06:41:42.103: *Mar 23 06:41:42.103: hops 255, forward to ulp Sending on FastEthernet0/0 source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) dest 3FFE::20F:23FF:FE5E:EC80 traffic class 0, flow 0x0, len 100+14, prot 58, hops source 3FFE::20F:23FF:FE5E:EC80 (local) dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) traffic class 0, flow 0x0, len 100+14, prot 58, hops Sending on FastEthernet0/0 Sending on FastEthernet0/0 source FE80::20F:23FF:FE5E:EC80 (local) dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) traffic class 224, flow 0x0, len 72+8, prot 58, hops Sending on FastEthernet0/0 source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0) dest FE80::20F:23FF:FE5E:EC80 traffic class 224, flow 0x0, len 64+14, prot 58, Sending on FastEthernet0/0 source FE80::20D:28FF:FE9E:F940 (FastEthernet0/0) dest FE80::20F:23FF:FE5E:EC80 traffic class 224, flow 0x0, len 72+14, prot 58, source FE80::20F:23FF:FE5E:EC80 (local) dest FE80::20D:28FF:FE9E:F940 (FastEthernet0/0) traffic class 224, flow 0x0, len 64+16, prot 58, Sending on FastEthernet0/0 source FE80::20F:23FF:FE5E:EC80 (local) dest FE80::20D:28FF:FE9E:F940 (FastEthernet0/0) traffic class 224, flow 0x0, len 72+8, prot 58, hops Sending on FastEthernet0/0 source FE80::20D:28FF:FE9E:F940 (FastEthernet0/0) dest FE80::20F:23FF:FE5E:EC80 traffic class 224, flow 0x0, len 64+14, prot 58, NOTE: We can determine that these are ICMPv6 packets because of the specified protocol number of 58, which is the protocol number used by ICMPv6 as specified in RFC 2463. Another useful troubleshooting command is the debug ipv6 routing command, which can be used to display information on IPv6 RIB and route cache updates. Again, this command is protocol independent and prints information for all routing protocols. Following is a sample of the information that is printed by this command: 395 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R1#debug ipv6 routing IPv6 routing table events debugging is on R1# R1#clear ipv6 route * R1# R1# *Mar 23 06:59:49.219: IPv6RT0: ospf 1, Delete 2001::2/128 from table *Mar 23 06:59:49.219: IPv6RT0: rip TSHOOT, Delete 2002::2/128 from table *Mar 23 06:59:49.219: IPv6RT0: rip TSHOOT, Delete 2003::2/128 from table *Mar 23 06:59:49.219: IPv6RT0: ospf 1, Route add 2001::2/128 [new] *Mar 23 06:59:49.219: IPv6RT0: ospf 1, Add 2001::2/128 to table *Mar 23 06:59:49.223: IPv6RT0: ospf 1, Adding next-hop FE80::20D:28FF:FE9E:F940 over FastEthernet0/0 for 2001::2/128, [110/1] *Mar 23 06:59:49.223: IPv6RT0: ospf 1, Reuse backup for 3FFE::/64, distance 110 *Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Route add 2002::2/128 [new] *Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Add 2002::2/128 to table *Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Adding next-hop FE80::20D:28FF:FE9E:F940 over FastEthernet0/0 for 2002::2/128, [120/2] *Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Route add 2003::2/128 [new] *Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Add 2003::2/128 to table *Mar 23 06:59:49.227: IPv6RT0: rip TSHOOT, Adding next-hop FE80::20D:28FF:FE9E:F940 over FastEthernet0/0 for 2003::2/128, [120/2] *Mar 23 06:59:49.227: IPv6RT0: rip TSHOOT, Reuse backup for 3FFE::/64, distance 120 *Mar 23 06:59:49.227: IPv6RT0: Event: 2001::2/128, Del, owner ospf, previous None *Mar 23 06:59:49.227: IPv6RT0: Event: 2002::2/128, Del, owner rip, previous None *Mar 23 06:59:49.227: IPv6RT0: Event: 2003::2/128, Del, owner rip, previous None *Mar 23 06:59:49.231: IPv6RT0: Event: 2001::2/128, Add, owner ospf, previous None *Mar 23 06:59:49.231: IPv6RT0: Event: 2002::2/128, Add, owner rip, previous None *Mar 23 06:59:49.231: IPv6RT0: Event: 2003::2/128, Add, owner rip, previous None *Mar 23 07:00:02.251: IPv6RT0: rip TSHOOT, Delete 2003::2/128 from table *Mar 23 07:00:02.251: IPv6RT0: rip TSHOOT, Delete backup for 3FFE::/64 *Mar 23 07:00:02.251: IPv6RT0: Event: 2003::2/128, Del, owner rip, previous None From the debug output above, we can determine that the local router is running OSPFv3 using process ID 1. The router is also running RIPng, using an instance named TSHOOT. Both OSPFv3 and RIPng are using the default administrative distances of 110 and 120, respectively. We can also determine that the local router is receiving the 2001::2/128 prefix from another OSPFv3 router with the Link-Local address FE80::20D:28FF:FE9E:F940. The router is also receiving the 2002::2/128 and 2003::2/128 prefixes via RIPng from another RIPng-enabled router with the Link-Local address FE80::20D:28FF:FE9E:F940 (i.e., the same router also running OSPFv3). While all three prefixes are originally inserted into the routing table, we can see that the 2003::2/128 prefix was eventually removed. You can use this information to determine whether this is what should be happening (e.g., maybe it complies with a configured RIPng filter), or troubleshoot the cause if this is not expected behavior (e.g., maybe the prefix is flapping). 396 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y Verifying and Troubleshoo ng Basic IPv6 Protocol Opera on The Neighbor Discovery Protocol (NDP) is a core element of IPv6. NDP operates at the Link Layer and is responsible for discovery of other nodes on the link, determining the Link Layer addresses of other nodes, finding available routers, and maintaining reachability information about the paths to other active neighbor nodes. NDP performs functions for IPv6 that are similar to the way ARP and ICMP Router Discovery and Router Redirect protocols do for IPv4. NDP defines five types of ICMPv6 packets, which are listed and described in Table 10-1 below: Table 10-1. ICMPv6 NDP Message Types ICMPv6 Type 133 134 135 136 137 Message Type Description and IPv6 Usage Used for Router Solicitation (RS) messages Used for Router Advertisement (RA) messages Used for Neighbor Solicitation (NS) messages Used for Neighbor Advertisement (NA) messages Router Redirect While you are not expected to perform any advanced NDP troubleshooting, you can leverage NDP to troubleshoot basic misconfigurations between devices on the local segment. For example, you could use the show ipv6 routers [conflict] command to determine the routers that reside on the local link (same multi-access segment) and verify that their NDP configuration parameters are the same as the local router as illustrated in the following output: R1#show ipv6 routers Router FE80::20F:23FF:FE5E:EC80 on FastEthernet0/0, last update 2 min Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500 HomeAgentFlag=0, Preference=Medium Reachable time 0 msec, Retransmit time 0 msec Prefix 3FFE::/64 onlink autoconfig Valid lifetime 2592000, preferred lifetime 604800 Router FE80::213:7FFF:FEAF:3E00 on FastEthernet0/0, last update 0 min Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500 HomeAgentFlag=0, Preference=Medium Reachable time 0 msec, Retransmit time 0 msec Prefix 3FFE::/64 onlink autoconfig Valid lifetime 2592000, preferred lifetime 604800 Router FE80::20D:28FF:FE9E:F940 on FastEthernet0/0, last update 0 min Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500 HomeAgentFlag=0, Preference=Medium Reachable time 0 msec, Retransmit time 0 msec Prefix 3FFE::/64 onlink autoconfig Valid lifetime 2592000, preferred lifetime 604800 397 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I From the output above, we can determine that there are three additional IPv6 routers on the local segment. All three routers have the same prefix and additional IPv6 configuration parameters. Using this command, you could troubleshoot misconfigured device issues by appending the conflicts keyword, which prints information on received Router Advertisements that differ from the advertisements configured for any of the local interfaces, as illustrated in the following output: R1#show ipv6 routers conflicts Router FE80::20F:23FF:FE5E:EC80 on FastEthernet0/0, last update 1 min, CONFLICT Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500 HomeAgentFlag=0, Preference=Medium Reachable time 0 msec, Retransmit time 0 msec Prefix 3FFE::/64 onlink autoconfig Valid lifetime -1, preferred lifetime -1 Router FE80::213:7FFF:FEAF:3E00 on FastEthernet0/0, last update 0 min, CONFLICT Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500 HomeAgentFlag=0, Preference=Medium Reachable time 0 msec, Retransmit time 0 msec Prefix 3FFE::/64 onlink autoconfig Valid lifetime -1, preferred lifetime -1 Another useful command when troubleshooting basic IPv6 functionality is the show ipv6 traffic command. This command prints information on sent and received IPv6 packets, which can be used to troubleshoot anything from Layer 1/Layer 2 problems to device misconfigurations, depending on the information printed in the output. Following is a sample output of the information that is printed by this command: R1#show ipv6 traffic IPv6 statistics: Rcvd: 1710 total, 1710 local destination 0 source-routed, 0 truncated 0 format errors, 0 hop count exceeded 0 bad header, 0 unknown option, 0 bad source 0 unknown protocol, 0 not a router 0 fragments, 0 total reassembled 0 reassembly timeouts, 0 reassembly failures 0 unicast RPF drop, 0 suppressed RPF drop Sent: 1466 generated, 0 forwarded 0 fragmented into 0 fragments, 0 failed 0 encapsulation failed, 0 no route, 0 too big Mcast: 1655 received, 1412 sent ICMP statistics: Rcvd: 168 input, 0 checksum errors, 0 too short 0 unknown info type, 0 unknown error type unreach: 0 routing, 0 admin, 0 neighbor, 0 address, 0 port 398 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y parameter: 0 error, 0 header, 0 option 0 hop count expired, 0 reassembly timeout,0 too big 5 echo request, 15 echo reply 0 group query, 0 group report, 0 group reduce 0 router solicit, 123 router advert, 0 redirects 11 neighbor solicit, 14 neighbor advert Sent: 107 output, 0 rate-limited unreach: 0 routing, 0 admin, 0 neighbor, 0 address, 0 port parameter: 0 error, 0 header, 0 option 0 hop count expired, 0 reassembly timeout,0 too big 15 echo request, 5 echo reply 0 group query, 0 group report, 0 group reduce 0 router solicit, 56 router advert, 0 redirects 16 neighbor solicit, 15 neighbor advert UDP statistics: Rcvd: 571 input, 0 checksum errors, 0 length errors 0 no port, 0 dropped Sent: 346 output TCP statistics: Rcvd: 0 input, 0 checksum errors Sent: 0 output, 0 retransmitted Parsing through the output above, if you notice a large number of errors, it may be due to Layer 1 issues. In such cases, proceed and troubleshoot this Layer using the appropriate commands and methodology, such as component swapping for example, until the errors are no longer present. As another example, if the local device is sending NDP messages but shows that it is not receiving any, and you know that there is at least another device connected to the local segment or link, you would verify connectivity between the two and check for intermediate device misconfigurations, such as incorrect VLAN assignments or configurations on a switch. As can be seen, the information printed by this command can be very useful in helping to identify and isolate a plethora of issues that may cause IPv6 problems. UNDERSTANDING AND TROUBLESHOOTING EIGRPV6 EIGRPv6 retains the same basic core functions as EIGRPv4. For example, both versions still use DUAL to ensure loop-free paths, and both protocols use Multicast packets to send updates – although EIGRPv6 uses IPv6 Multicast address FF02::A instead of the 224.0.0.10 group address used by EIGRPv4. While the same core fundamentals are retained, there are some differences between these versions. Table 10-2 below lists the differences between EIGRPv4 and EIGRPv6, or simply and more commonly between EIGRP for IPv4 and EIGRP for IPv6. 399 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Table 10-2. EIGRPv4 and EIGRPv6 Differences Protocol Characteristic EIGRP for IPv4 EIGRP for IPv6 Automatic Summarization Authentication or Security Common Subnet for Peers Advertisement Contents Packet Encapsulation Yes MD5 Yes Subnet/Mask IPv4 Not Applicable Built into IPv6 No Prefix/Length IPv6 Given the similarity of EIGRPv4 and EIGRPv6, the troubleshooting approaches for both of these protocols are also very similar. For example, EIGRPv6 still requires that the same autonomous system number be used in order for a neighbor relationship to be established. In addition, other EIGRP parameters, such as authentication configuration and K values, should also be the same; otherwise, EIGRPv6 will not establish a neighbor relationship with peer routers. However, there are some subtle differences because of the manner in which these two protocols operate and are configured with which you should be familiar. The first is that while there is no explicit configuration to enable the EIGRPv4 routing process, by default, when EIGRPv6 is enabled, the protocol defaults to a shutdown state in Cisco IOS software. This default state is displayed when you issue the show ipv6 eigrp neighbors command as illustrated below: R1#show ipv6 eigrp neighbors IPv6-EIGRP neighbors for process 1 % EIGRP 1 is in SHUTDOWN Alternatively, the default state is also displayed when you parse through the configuration as illustrated in the following output: R1#show running-config | section eigrp ipv6 eigrp 1 ipv6 router eigrp 1 shutdown When configuring EIGRPv6, you must issue the no shutdown router configuration command to enable the routing process. When the no shutdown command is issued, keep in mind that it is not included in the configuration. Another common problem is forgetting to specify a router ID. Unlike EIGRPv4, if there are no interfaces with an IPv4 address, it is mandatory that you specify the router ID for EIGRPv6 using the router-id <ipv4-address> router configuration command. If EIGRPv6 is enabled on a router with no interfaces in the up state assigned an IPv4 address, the routing protocol is not enabled 400 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y and the following error message is printed on the console when you issue the show ipv6 eigrp neighbors command: R1#show ipv6 eigrp neighbors IPv6-EIGRP neighbors for process 1 % No router ID for EIGRP 1 While basic WAN interface operation is the same for both EIGRPv4 and EIGRPv6, keep in mind that EIGRPv6 uses Link-Local addresses for peer relationships and as the next-hop IPv6 address for received updates. Therefore, when enabling EIGRPv6 over NBMA technologies such as ATM and Frame Relay, you must configure static mappings using the Link-Local addresses, not the global Unicast addresses. If you specify global Unicast addresses, while the routing protocol neighbor relationships will be established, you will not have reachability to remote subnets. Following is a sample configuration of how to configure static Frame Relay mappings using the Link-Local addresses on the routers: R1(config)#interface Serial1/1 R1(config-if)#frame-relay map ipv6 FE80::205:5EFF:FE6E:5C80 111 broadcast R1(config-if)#exit This configuration can be validated using the show frame-relay map command as follows: R1#show frame-relay map Serial1/1 (up): ipv6 FE80::205:5EFF:FE6E:5C80 dlci 111(0x6F,0x18F0), static, broadcast, CISCO, status defined, active A useful EIGRPv6 command is the show ipv6 eigrp traffic command, which provides information on EIGRP packet statistics, such as Hello packets, for example. This command can be used to troubleshoot EIGRP operational issues. For example, if the local router is sending Hellos but is not receiving any back, it may be due to Link Layer issues or even EIGRP packet filtering or blocking. Following is a sample output of this command: R1#show ipv6 eigrp traffic IPv6-EIGRP Traffic Statistics for AS 1 Hellos sent/received: 409/392 Updates sent/received: 18/19 Queries sent/received: 0/0 Replies sent/received: 0/0 Acks sent/received: 6/0 SIA-Queries sent/received: 0/0 SIA-Replies sent/received: 0/0 Hello Process ID: 225 PDM Process ID: 224 401 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I IPv6 Socket queue: 0/50/1/0 (current/max/highest/drops) Eigrp input queue: 0/2000/1/0 (current/max/highest/drops) As is the case with EIGRPv4, the debug eigrp packets command can be used to view real-time information on EIGRP packets as illustrated in the following output: R1#debug eigrp packets EIGRP Packets debugging is on (UPDATE, REQUEST, QUERY, REPLY, HELLO, IPXSAP, PROBE, ACK, STUB, SIAQUERY, SIAREPLY) R1# R1# *Mar 1 04:19:27.771: EIGRP: Sending HELLO on Serial1/1 *Mar 1 04:19:27.771: AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 *Mar 1 04:19:28.519: EIGRP: Received HELLO on Serial1/1 nbr FE80::202:FDFF:FE06:6350 *Mar 1 04:19:28.519: AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0 *Mar 1 04:19:32.459: EIGRP: Sending HELLO on Serial1/1 *Mar 1 04:19:32.459: AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 *Mar 1 04:19:33.231: EIGRP: Received HELLO on Serial1/1 nbr FE80::202:FDFF:FE06:6350 *Mar 1 04:19:33.231: AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0 peerQ un/rely 0/0 While the debug eigrp command can be used to troubleshoot both EIGRPv4 and EIGRPv6, you should use the debug ipv6 eigrp suite of commands to view EIGRPv6-specific events. This command supports the following options: R1#debug ipv6 eigrp ? <1-65535> Autonomous System neighbor EIGRP neighbor debugging notifications EIGRP event notifications summary EIGRP summary route processing <cr> Following is a sample output of the debug ipv6 eigrp commands, which prints real-time information on EIGRPv6 route events: R1#debug ipv6 eigrp 1 IP-EIGRP Route Events debugging is on R1#clear ipv6 eigrp 1 neighbors R1# *Mar 1 04:20:53.111: %DUAL-5-NBRCHANGE: IPv6-EIGRP(0) 1: Neighbor FE80::202:FDFF:FE06:6350 (Serial1/1) is down: manually cleared *Mar 1 04:20:56.807: %DUAL-5-NBRCHANGE: IPv6-EIGRP(0) 1: Neighbor 402 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y FE80::202:FDFF:FE06:6350 (Serial1/1) is up: new adjacency *Mar 1 04:20:56.827: IPv6-EIGRP(0:1): Processing incoming UPDATE packet *Mar 1 04:20:56.831: IPv6-EIGRP(0:1): 2001::/64 - do advertise out Serial1/1 *Mar 1 04:20:56.831: IPv6-EIGRP(0:1): Int 2001::/64 metric 20512000 - 20000000 512000 *Mar 1 04:20:56.847: IPv6-EIGRP(0:1): Processing incoming UPDATE packet *Mar 1 04:20:56.851: IPv6-EIGRP(0:1): Int 2001::/64 M 21024000 - 20000000 1024000 SM 2169856 - 1657856 512000 *Mar 1 04:20:56.851: IPv6-EIGRP(0:1): 2001::/64 routing table not updated *Mar 1 04:20:56.851: IPv6-EIGRP(0:1): 2001::/64 - do advertise out Serial1/1 *Mar 1 04:20:56.851: IPv6-EIGRP(0:1): Int 2001::/64 metric 20512000 - 20000000 512000 *Mar 1 04:20:56.867: IPv6-EIGRP(0:1): Processing incoming UPDATE packet *Mar 1 04:20:56.871: IPv6-EIGRP(0:1): Int 2001::/64 M 21024000 - 20000000 1024000 SM 2169856 - 1657856 512000 UNDERSTANDING AND TROUBLESHOOTING RIPNG For the most part, RIPng is very similar to the RIPv2 specification. However, it is important to remember that there are some notable differences with which you should be familiar regarding these two routing protocols. These similarities and differences are listed in Table 10-3 below: Table 10-3. RIPv2 and RIPng Similarities and Differences Protocol Characteristic RIPv2 RIPng Protocol Classification Hop Limitation Split Horizon Poison Reverse Transport Layer Protocol Multicast Updates Administrative Distance Hold-down Timers Destination Prefix Length Next-hop Length Next-hop Address Transport UDP Port Number Authentication Automatic Summarization Can Broadcast Updates Distance Vector 15 Yes Yes UDP Yes (224.0.0.9) 120 Yes 32-bit 32-bit Primary Interface Address IPv4 520 Text and MD5 Yes (enabled by default) Yes Distance Vector 15 Yes Yes UDP Yes (FF02::9) 120 Yes 128-bit 128-bit Link-Local Address IPv6 521 Inbuilt into IPv6 (IPSec) Not Applicable Not Applicable 403 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I When troubleshooting RIPng, you should keep in mind that RIPng has the same limitations as RIPv2. For example, updates with a hop count of 16 (or greater) are considered unreachable. These routes will not be installed into the routing table. The primary problems experienced when implementing RIPng are due to misconfigurations, as the configuration syntax for RIPv2 and RIPng is significantly different in Cisco IOS software. For example, when configuring RIPv2, you can configure the router to advertise a default route in router configuration mode using the default-information originate router configuration command. This allows the RIPv2 router to advertise the default route to all other neighbors. With RIPng, however, this functionality is now performed under the interface. Therefore, the default route is advertised only out of that specified interface, which means that multiple instances of this command are required on a router with multiple RIPng-enabled interfaces and multiple neighbors that should all be receiving the default route. Recapping what is described in the ROUTE study guide, Table 10-4 below lists some common configuration commands and how they are applied in RIPv2 and RIPng, respectively: Table 10-4. RIPv2 and RIPng Cisco IOS Software Configuration Differences Command Function RIPv2 Command RIPng Command Enable RIP routing Use the router rip global configuration command Use the ipv6 router rip [tag] global configuration command Advertise networks or prefixes using RIP Use the network router configuration command Use the ipv6 rip [tag] enable interface configuration Generate a RIP default route Use the default- command Use the ipv6 rip [tag] information originate router configuration command Enable or disable Split Horizon Verify received RIP routing information Verify the RIP database Use the [no] ip splithorizon interface configuration command Use the show ip route [rip] command Use the show ip rip database command default-information [originate | only] interface configuration command Use the [no] split-horizon router configuration command Use the show ipv6 route [rip] command Use the show ipv6 rip [tag] database command As previously stated, commonly experienced problems with RIPng are due to misconfiguration on the devices running RIPng. Commonly encountered issues include the following: 404 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y • Incorrect default route advertisement configuration causes connectivity issues • The router is not receiving routes • The router is advertising routes it should not be advertising • No routes are installed into the RIB RIPng uses the ipv6 rip [tag] default-information [originate | only] interface configuration command to advertise a default route. By default, when this command is specified, the router will advertise the default route even if one is not present in the routing table. In most cases, when advertising a default route to downstream neighbors, there is typically no need to advertise any other specific routes. However, there are instances when you might want to advertise both the default and some other more specific routes. For example, consider the topology that is illustrated in Figure 10-1 below: ISP ISP MP-BGP FC00::2/128 MP-BGP R2 R3 Fa0/0 FC00::3/128 Fa0/0 RIPng Fa0/0.22 Fa0/0.33 R1 Fig. 10-1. RIPng Default Routing Issues Referencing Figure 10-1, routers R1, R2, and R3 are running RIPng. In addition, routers R2 and R3 are peered to two ISPs and are running MP-BGP. These routers are receiving multiple routes from the ISP. Therefore, rather than redistribute these into RIPng, administrators have decided instead to configure R2 and R3 to send R1 a default route. Based on this solution, the current routing table on R1 displays the following entries: R1#show ipv6 route IPv6 Routing Table - 3 entries Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP 405 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R L L U - Per-user Static route I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2 ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2 ::/0 [120/2] via FE80::213:19FF:FE86:CA20, FastEthernet0/0.22 via FE80::201:96FF:FE1B:DB80, FastEthernet0/0.33 FE80::/10 [0/0] via ::, Null0 FF00::/8 [0/0] via ::, Null0 While this configuration works well for Internet-based traffic, assuming both R2 and R3 have the same Internet routes, it does introduce intermediate connectivity issues from R1 to the Loopback0 subnets configured on routers R2 and R3 as illustrated in the following ping outputs: R1#ping fc00::2 repeat 10 source FastEthernet0/0 verbose Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to FC00::2, timeout is 2 seconds: Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0 Reply to request 0 (0 ms) Request 1 received unknown echo response type U Reply to request 2 (0 ms) Request 3 received unknown echo response type U Reply to request 4 (4 ms) Request 5 received unknown echo response type U Reply to request 6 (0 ms) Request 7 received unknown echo response type U Reply to request 8 (0 ms) Request 9 received unknown echo response type U Success rate is 50 percent (5/10), round-trip min/avg/max = 0/0/4 ms R1#ping fc00::3 repeat 10 source FastEthernet0/0 verbose Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to FC00::3, timeout is 2 seconds: Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0 Reply to request 0 (4 ms) Request 1 received unknown echo response type U Reply to request 2 (0 ms) Request 3 received unknown echo response type U Reply to request 4 (0 ms) Request 5 received unknown echo response type U Reply to request 6 (4 ms) Request 7 received unknown echo response type U Reply to request 8 (0 ms) Request 9 received unknown echo response type U Success rate is 50 percent (5/10), round-trip min/avg/max = 0/1/4 ms The reason for this issue is that one packet is sent to R2 while another is sent to R3. The packet sent to the router on which the /128 address is configured will be responded to; however, the packet sent 406 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y to the router on which the /128 address is not configured will time out. This is because the local router is load-balancing across the equal-cost path (i.e., the 0.0.0.0 route that is received from both routers R2 and R3. As you troubleshoot, you verify the configurations of R2 and R3 and notice that their respective FastEthernet0/0 interfaces have been configured with the ipv6 rip TSHOOT default-information only interface configuration command as is illustrated below on router R2: R2#show running-config interface FastEthernet0/0 Building configuration... Current configuration : 168 bytes ! interface FastEthernet0/0 duplex auto speed auto ipv6 enable ipv6 rip TSHOOT enable ipv6 rip TSHOOT default-information only end While this configuration allows RIPng to advertise the default route, it also suppresses all other specific routes, leading to the intermittent connectivity issue from R1 to the R2 and R3 Loopback0 subnets illustrated above. The recommended solution in this case is to allow the router to advertise the other routes in addition to the default route. The reason this is commonly an issue is because the same behavior is not applicable when using RIPv2. In other words, with RIPv2, generating the default route does not suppress all other routes. This is yet another difference in protocol operation in Cisco IOS software that can cause problems. The solution is implemented on routers R2 and R3 as follows: R2(config)#interface FastEthernet0/0 R2(config-if)#ipv6 rip TSHOOT default-information originate R2(config-if)#exit R3(config)#interface FastEthernet0/0 R3(config-if)#ipv6 rip TSHOOT default-information originate R3(config-if)#exit Following this configuration, the routing table on R1 displays the following entries: R1#show ipv6 route IPv6 Routing Table - 6 entries Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP U - Per-user Static route I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2 407 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2 R ::/0 [120/2] via FE80::213:19FF:FE86:CA20, FastEthernet0/0.22 via FE80::201:96FF:FE1B:DB80, FastEthernet0/0.33 R FC00::2/128 [120/2] via FE80::213:19FF:FE86:CA20, FastEthernet0/0.22 R FC00::3/128 [120/2] via FE80::201:96FF:FE1B:DB80, FastEthernet0/0.33 L FE80::/10 [0/0] via ::, Null0 FF00::/8 [0/0] via ::, Null0 L Following this reconfiguration on R2 and R3, the same ping tests performed on R1 earlier, which had a 50% success rate, are now completely successful as illustrated below: R1#ping fc00::2 repeat 10 source FastEthernet0/0 verbose Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to FC00::2, timeout is 2 seconds: Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0 Reply to request 0 (0 ms) Reply to request 1 (4 ms) Reply to request 2 (0 ms) Reply to request 3 (0 ms) Reply to request 4 (4 ms) Reply to request 5 (0 ms) Reply to request 6 (4 ms) Reply to request 7 (0 ms) Reply to request 8 (0 ms) Reply to request 9 (4 ms) Success rate is 100 percent (10/10), round-trip min/avg/max = 0/1/4 ms R1#ping fc00::3 repeat 10 source FastEthernet0/0 verbose Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to FC00::3, timeout is 2 seconds: Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0 Reply to request 0 (4 ms) Reply to request 1 (0 ms) Reply to request 2 (0 ms) Reply to request 3 (0 ms) Reply to request 4 (4 ms) Reply to request 5 (0 ms) Reply to request 6 (0 ms) Reply to request 7 (4 ms) Reply to request 8 (4 ms) 408 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y Reply to request 9 (0 ms) Success rate is 100 percent (10/10), round-trip min/avg/max = 0/1/4 ms The most common cause for the router not to receive any routes is due to misconfiguration. This may be due to route filtering misconfiguration (i.e., incorrectly configured distribute list filters), failing to enable RIPng under the correct interfaces, or even due to the incorrect use of some commands, such as the default-information command, as was illustrated in the previous example. When troubleshooting such issues, verify device configurations. If any route filters are being used and have been applied, ensure that they are permitting the correct networks. In the event that the router is not advertising prefixes that it should be, verify that the interfaces are in an up/up state on the local router. Additionally, because RIPng does not use network statements, verify that the RIPng process has been enabled under the interface(s). You can use the show ipv6 rip command to determine which RIPng process is enabled under which interfaces, among other things. Following is a sample output of the information that can be garnered from this command: R1#show ipv6 rip RIP process “TSHOOT”, port 521, multicast-group FF02::9, pid 231 Administrative distance is 120. Maximum paths is 16 Updates every 30 seconds, expire after 180 Holddown lasts 0 seconds, garbage collect after 120 Split horizon is on; poison reverse is off Default routes are not generated Periodic updates 352, trigger updates 46 Interfaces: FastEthernet0/0.22 Serial0/0 Redistribution: None RIP process “CCNP”, port 521, multicast-group FF02::9, pid 232 Administrative distance is 120. Maximum paths is 16 Updates every 30 seconds, expire after 180 Holddown lasts 0 seconds, garbage collect after 120 Split horizon is on; poison reverse is off Default routes are not generated Periodic updates 23, trigger updates 12 Interfaces: FastEthernet0/0.22 FastEthernet0/0.33 Redistribution: None As illustrated in the output above, it is possible for multiple RIPng instances to be configured under the same interface. When implementing RIPng, if you specify one process under the interface and then specify another, the interface belongs to both processes, or instances. Cisco IOS software does not overwrite the previous process with the new one. 409 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I There are several reasons why the router may not install routes into the RIB. Common causes include route filtering and routes received with a metric of ‘unreachable’ (i.e., routes received with a hop count of 16 or greater). In addition to basic show commands, you can also troubleshoot RIPng problems using the debug ip rip command. This command provides detailed information on RIPng received and sent updates as illustrated in the output below: R1#debug ipv6 rip RIP Routing Protocol debugging is on R1# *Oct 20 11:12:55.420: RIPng: Suppressed null multicast update on FastEthernet0/0.22 for TSHOOT *Oct 20 11:12:56.590: RIPng: response received from FE80::213:19FF:FE86:A20 on FastEthernet0/0.22 for TSHOOT *Oct 20 11:12:56.590: src=FE80::213:19FF:FE86:A20 (FastEthernet0/0.22) *Oct 20 11:12:56.590: dst=FF02::9 *Oct 20 11:12:56.590: sport=521, dport=521, length=92 *Oct 20 11:12:56.590: command=2, version=1, mbz=0, #rte=4 *Oct 20 11:12:56.590: tag=0, metric=1, prefix=2004::20/128 *Oct 20 11:12:56.590: tag=0, metric=1, prefix=FC00::10/128 *Oct 20 11:12:56.590: tag=0, metric=1, prefix=FC00::20/128 *Oct 20 11:12:56.590: tag=0, metric=1, prefix=FC00::30/128 *Oct 20 11:12:56.590: RIPng: Added neighbor FE80::213:19FF:FE86:A20/ FastEthernet0/0.22 *Oct 20 11:12:56.590: RIPng: Inserted 2004::20/128, nexthop FE80::213:19FF:FE86:A20, metric 12, tag 0 *Oct 20 11:12:56.594: RIPng: Inserted FC00::10/128, nexthop FE80::213:19FF:FE86:A20, metric 12, tag 0 *Oct 20 11:12:56.594: RIPng: Inserted FC00::20/128, nexthop FE80::213:19FF:FE86:A20, metric 12, tag 0 *Oct 20 11:12:56.594: RIPng: Inserted FC00::30/128, nexthop FE80::213:19FF:FE86:A20, metric 12, tag 0 *Oct 20 11:12:56.594: RIPng: Triggered update requested, in hold-down *Oct 20 11:13:00.421: RIPng: generating triggered update for TSHOOT *Oct 20 11:13:00.421: RIPng: Suppressed null multicast update on FastEthernet0/0.22 for TSHOOT *Oct 20 11:13:07.360: RIPng: Next RIB walk in 169230 *Oct 20 11:13:11.992: RIPng: Process TSHOOT received response for CCNP on FastEthernet0/0.33 *Oct 20 11:13:11.992: RIPng: response received from FE80::201:96FF:FE1B:DB80 on FastEthernet0/0.33 for CCNP *Oct 20 11:13:11.992: src=FE80::201:96FF:FE1B:DB80 (FastEthernet0/0.33) *Oct 20 11:13:11.992: dst=FF02::9 *Oct 20 11:13:11.992: sport=521, dport=521, length=92 *Oct 20 11:13:11.992: command=2, version=1, mbz=0, #rte=4 *Oct 20 11:13:11.992: tag=0, metric=4, prefix=FC00::1/128 *Oct 20 11:13:11.992: tag=0, metric=4, prefix=FC00::2/128 *Oct 20 11:13:11.992: tag=0, metric=4, prefix=FC00::3/128 *Oct 20 11:13:11.992: tag=0, metric=4, prefix=FC00::4/128 *Oct 20 11:13:11.996: RIPng: Added neighbor FE80::201:96FF:FE1B:DB80/ FastEthernet0/0.33 410 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y *Oct 20 11:13:11.996: RIPng: Inserted FC00::1/128, nexthop FE80::201:96FF:FE1B:DB80, metric 5, tag 0 *Oct 20 11:13:11.996: RIPng: RIPv6 ager started, 180000 *Oct 20 11:13:11.996: RIPng: Inserted FC00::2/128, nexthop FE80::201:96FF:FE1B:DB80, metric 5, tag 0 *Oct 20 11:13:11.996: RIPng: Inserted FC00::3/128, nexthop FE80::201:96FF:FE1B:DB80, metric 5, tag 0 *Oct 20 11:13:11.996: RIPng: Inserted FC00::4/128, nexthop FE80::201:96FF:FE1B:DB80, metric 5, tag 0 *Oct 20 11:13:12.000: RIPng: Triggered update requested *Oct 20 11:13:13.002: RIPng: generating triggered update for CCNP Referencing the debugging output above, we can determine that the local router is running two RIPng instances, or processes: one named TSHOOT and the other named CCNP. Instance TSHOOT receives four prefixes from neighbor FE80::213:19FF:FE86:A20, via FastEthernet0/0.22. These prefixes have a route metric of 1, indicating that they originated on that router. However, the prefixes are installed into the RIB with a route metric of 11. This means that the original metric is being offset using the ipv6 rip TSHOOT metric-offset 11 command under the FastEthernet0/0.22 interface. The local router is also receiving four prefixes from neighbor FE80::201:96FF:FE1B:DB80 via FastEthernet0/0.33. These prefixes are received with a route metric of 4. This is an indication that the prefixes are not local to the router (i.e., are not directly connected), or they were redistributed into RIPng on the local router and the metric was specified during redistribution. Because the metric is incremented by 1 when the routes are installed into the RIB, we can conclude that no non-default metric offsetting is configured under the FastEthernet0/0.33 interface on the router. UNDERSTANDING AND TROUBLESHOOTING OSPFV3 OSPFv3 is defined in RFC 2740 and is the counterpart of OSPFv2, but is designed explicitly for the IPv6 routed protocol. The similarities shared by OSPFv2 and OSPFv3 are as follows: • OSPFv3 continues to use the same packets that are also used by OSPFv2 • Neighbor discovery and the adjacency formation process are the same • OSPFv3 still maintains RFC-compliant on different technologies • Both OSPFv2 and OSPFv3 use the same LSA flooding and aging mechanisms • Like OSPFv2, the OSPFv3 router ID requires the use of a 32-bit IPv4 address • Like OSPFv2, the OSPFv3 link ID is based on a 32-bit IPv4 address While there are similarities between OSPFv2 and OSPFv3, it is important to understand that some significant differences exist with which you must be familiar. These include the following: 411 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • OSPFv3 uses IPv6 Link-Local addresses (not global addresses) to identify the OSPFv3 adjacencies • OSPFv3 introduces two new OSPF LSA types, which are Type 8 and Type 9 LSAs • OSPFv3 messages are sent over (encapsulated in) IPv6packets, not IPv4 packets • OSPFv3 uses standard IPv6 Multicast addresses FF02::5 and FF02::6 • OSPFv3 leverages the inbuilt capabilities of IPSec for security and authentication • The Options field in Hello and DBD packets include the R-bit and the V6-bit • The OSPFv3 Hello packet contains no address information, but includes an interface ID NOTE: Additional detailed information on these differences and similarities can be found in the current ROUTE study guide, which is available online. For all intents and purposes, and barring configuration differences, the same methods used to troubleshoot OSPFv2 are applicable when troubleshooting OSPFv3 because the protocols operate in the same manner. When troubleshooting OSPFv2, Cisco IOS software OSPF commands begin with show ip ospf. In a similar manner, the OSPFv3 show commands begin with show ipv6 ospf. This command supports the following keywords: R1#show ipv6 ospf ? <1-65535> border-routers database flood-list interface neighbor request-list retransmission-list summary-prefix traffic virtual-links | <cr> Process ID number Border and Boundary Router Information Database summary Link State flood list Interface information Neighbor list Link State request list Link State retransmission list Summary-prefix redistribution Information OSPF traffic information Virtual link information Output modifiers NOTE: The majority of the supported keywords are described in greater detail in the ROUTE study guide, which is available online. Please refer to that guide for additional information on the keywords that are not described in this section. It should also be noted that keywords that are beyond the scope of ROUTE and TSHOOT exam requirements are not discussed in either study guide. A commonly used troubleshooting command is the show ipv6 ospf <process> command. This command can be used to verify router configuration (e.g., ABR and ASBR configuration), verify the areas configured, and the types (e.g., stub, NSSA, etc.). Additionally, you can use this command to 412 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y determine the number of times the SPF algorithm has been run in certain areas, which is useful for troubleshooting issues such as link route flapping, for example. The following output displays the information that is provided by this command: R1#show ipv6 ospf Routing Process “ospfv3 1” with ID 1.1.1.1 It is an area border and autonomous system boundary router Redistributing External Routes from, rip rip SPF schedule delay 5 secs, Hold time between two SPFs 10 secs Minimum LSA interval 5 secs. Minimum LSA arrival 1 secs LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs Retransmission pacing timer 66 msecs Number of external LSA 8. Checksum Sum 0x03D2E5 Number of areas in this router is 2. 2 normal 0 stub 0 nssa Reference bandwidth unit is 100 mbps Area BACKBONE(0) Number of interfaces in this area is 1 SPF algorithm executed 4 times Number of LSA 5. Checksum Sum 0x0309A8 Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 Area 1 Number of interfaces in this area is 1 SPF algorithm executed 3 times Number of LSA 2. Checksum Sum 0x01412A Number of DCbitless LSA 0 Number of indication LSA 0 Number of DoNotAge LSA 0 Flood list length 0 From the output printed above, we can determine the RID of the local router as well as the local OSPF process. Additionally, we can also determine that the router is an ABR and an ASBR, which is redistributing two different RIPng instances. By default, Cisco IOS will list the protocol name, one for each instance that is redistributed into OSPF. For example, if the local router was redistributing three different RIPng instances, the word ‘rip’ listed under the redistribution column would be printed three times. Additionally, we can also determine the number of external LSAs as well as the default reference bandwidth. Finally, the different areas, the number of interfaces in those areas, and the number of times the SPF algorithm has been executed for each individual area is also included in the output that is printed by this command. The border-routers keyword allows you to display internal OSPF routing table entries to an ABR and an ASBR. As is the case with OSPFv2, this command is useful when troubleshooting connec- 413 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I tivity issues to other areas in multi-area OSPF implementations because it can be used to confirm that the local router has a path to the ABR and the ASBR (if applicable) as illustrated below: R1#show ipv6 ospf border-routers OSPFv3 Process 1 internal Routing Table Codes: i - Intra-area route, I - Inter-area route i 2.2.2.2 [1] via FE80::2222, FastEthernet0/0.22, ABR, Area 0, SPF 7 i 3.3.3.3 [1] via FE80::3333, FastEthernet0/0.33, ABR/ASBR, Area 0, SPF 8 In addition to verifying routes to ABRs and ASBRs, the show ipv6 ospf border-routers command can also be used to determine whether the SPF calculation is functional because it also includes the internal number of the SPF calculation that installed the route. As is the case with OSPFv3, you can append the database keyword to view the contents of the LSDB. When viewing the OSPFv3 LSDB, keep in mind that this updated version of the OSPF routing protocol includes two new LSAs, which are Type 8 (Link LSA) and Type 9 (Intra-Area-Prefix LSA). The Link LSA provides the router’s Link-Local address and provides all the IPv6 prefixes attached to the link. There is one Link LSA per link; however, there can be multiple Intra-Area-Prefix LSAs with different Link-State IDs. The Area flooding scope can therefore be an associated prefix with the transit network referencing a Network LSA or an associated prefix with a router or stub referencing a Router LSA. Following is a sample of the OSPFv3 LSBD: R1#show ipv6 ospf database OSPFv3 Router with ID (1.1.1.1) (Process ID 1) Router Link States (Area 0) ADV Router 1.1.1.1 2.2.2.2 Age 972 1940 Seq# 0x8000000D 0x8000000B Fragment ID 0 0 Link Count 1 1 Net Link States (Area 0) ADV Router 2.2.2.2 Age 940 Seq# 0x80000009 Link ID 4 Rtr Count 2 Inter Area Prefix Link States (Area 0) ADV Router 2.2.2.2 2.2.2.2 2.2.2.2 Age 1940 1940 1940 Seq# 0x80000002 0x80000002 0x80000002 Prefix FC00::10/128 FC00::20/128 FC00::30/128 414 Bits EB B C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y Link (Type-8) Link States (Area 0) ADV Router 1.1.1.1 2.2.2.2 Age 973 941 Seq# 0x80000009 0x80000009 Link ID 9 4 Interface Fa0/0.22 Fa0/0.22 Router Link States (Area 1) ADV Router 1.1.1.1 Age 973 Seq# 0x8000000C Fragment ID 0 Link Count 0 Bits EB Inter Area Prefix Link States (Area 1) ADV Router 1.1.1.1 1.1.1.1 1.1.1.1 Age 1981 1981 1981 Seq# 0x80000002 0x80000002 0x80000002 Prefix FC00::10/128 FC00::20/128 FC00::30/128 Link (Type-8) Link States (Area 1) ADV Router 1.1.1.1 Age 973 Seq# 0x80000009 Link ID 10 Interface Fa0/0.33 Type-5 AS External Link States ADV Router 1.1.1.1 1.1.1.1 1.1.1.1 1.1.1.1 1.1.1.1 Age 973 975 975 975 975 Seq# 0x80000009 0x80000009 0x80000009 0x80000009 0x80000009 Prefix 2004::2/128 FC00::1/128 FC00::2/128 FC00::3/128 FC00::4/128 Finally, the show ipv6 ospf neighbor [detail] command is still a useful command when troubleshooting and verifying neighbor adjacencies. When using this command, keep in mind that even though it is applicable to OSPFv3, the command output will include IPv4 router IDs, since these are also used by OSPFv3. Following is a sample output of the information that is printed by this command: R2#show ipv6 ospf neighbor detail Neighbor 1.1.1.1 In the area 0 via interface FastEthernet0/0 Neighbor: interface-id 9, link-local address FE80::20C:CEFF:FEA7:F3A0 Neighbor priority is 1, State is FULL, 6 state changes DR is 2.2.2.2 BDR is 1.1.1.1 Options is 0x000013 in Hello (V6-Bit E-Bit R-bit ) Options is 0x000013 in DBD (V6-Bit E-Bit R-bit ) Dead timer due in 00:00:34 Neighbor is up for 00:01:30 Index 1/1/1, retransmission queue length 0, number of retransmission 0 415 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0) Last retransmission scan length is 0, maximum is 0 Last retransmission scan time is 0 msec, maximum is 0 msec In addition to show commands, you can also use the clear ipv6 ospf suite of commands to troubleshoot OSPFv3. The options that are available with this command are listed below: R1#clear ipv6 ospf ? <1-65535> Process ID number counters OSPF counters force-spf Run SPF for OSPF process process Reset OSPF process redistribution Clear OSPF route redistribution When using the clear ipv6 ospf suite of commands, you can specify the process number to which you want the relevant keyword applied. This is useful when you are performing OSPFv3 troubleshooting on a router running multiple processes and you do not want to impact any of the other processes running on the router. The counters keyword clears state change counters for the OSPFv3 neighbor(s) on a specified interface (if one is specified in conjunction with this command) for the specified OSPFv3 neighbor (if the neighbor ID is included) or for all of the OSPFv3 neighbors. The example that follows illustrates how to first verify, clear, or reset, and then verify again the state change counters for neighbors of a specified interface: R1#show ipv6 ospf neighbor FastEthernet0/0.22 detail Neighbor 2.2.2.2 In the area 0 via interface FastEthernet0/0.22 Neighbor: interface-id 4, link-local address FE80::2222 Neighbor priority is 1, State is FULL, 6 state changes DR is 2.2.2.2 BDR is 1.1.1.1 Options is 0x000013 in Hello (V6-Bit E-Bit R-bit) Options is 0x000013 in DBD (V6-Bit E-Bit R-bit) Dead timer due in 00:00:31 Neighbor is up for 00:01:38 Index 1/1/1, retransmission queue length 0, number of retransmission 3 First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0) Last retransmission scan length is 5, maximum is 5 Last retransmission scan time is 0 msec, maximum is 0 msec R1# R1#clear ipv6 ospf counters neighbor FastEthernet0/0.22 R1# R1# show ipv6 ospf neighbor FastEthernet0/0.22 detail Neighbor 2.2.2.2 In the Area 0 via interface FastEthernet0/0.22 Neighbor: interface-id 4, link-local address FE80::2222 Neighbor priority is 1, State is FULL, 0 state changes 416 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y DR is 2.2.2.2 BDR is 1.1.1.1 Options is 0x000013 in Hello (V6-Bit E-Bit R-bit) Options is 0x000013 in DBD (V6-Bit E-Bit R-bit) Dead timer due in 00:00:38 Neighbor is up for 00:02:12 Index 1/1/1, retransmission queue length 0, number of retransmission 3 First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0) Last retransmission scan length is 5, maximum is 5 Last retransmission scan time is 0 msec, maximum is 0 msec Clearing counters is useful when troubleshooting adjacency flaps, as it provides some indication as to how frequently they are occurring. You can then correlate this information with other events, such as syslog messages or periods of high CPU utilization, for example, when you are troubleshooting OSPFv3 problems. The force-spf keyword simply runs the SPF algorithm again. The main difference between the clear ipv6 ospf force-spf command and the clear ipv6 ospf process command is that the clear ipv6 ospf process command will restart the OSPF process, clear the OSPFv3 database, repopulate the database, and then run the SPF algorithm. This difference in operation can be validated by enabling OSPFv3 debugging and comparing the difference in output when either command is issued. The following output illustrates the events that occur when the clear ipv6 ospf force-spf command is issued on the router: R1#debug ipv6 ospf events OSPFv3 events debugging is on R1#debug ipv6 ospf spf OSPFv3 spf intra events debugging is on OSPFv3 spf inter events debugging is on OSPFv3 spf external events debugging is on R1#clear ipv6 ospf force-spf R1# *Oct 20 17:44:30.671: OSPFv3: running SPF for Area 1, cause R N SN SA L *Oct 20 17:44:30.671: OSPFv3: Intra-Area SPF (Full), Area 1 *Oct 20 17:44:30.671: Router LSA 1.1.1.1/0, 0 links *Oct 20 17:44:30.671: OSPFv3: Process Prefix LSAs *Oct 20 17:44:30.671: OSPFv3: Check VLs *Oct 20 17:44:30.671: OSPFv3: running SPF for Area 0, cause R N SN SA L *Oct 20 17:44:30.675: OSPFv3: Intra-Area SPF (Full), Area 0 *Oct 20 17:44:30.675: Router LSA 1.1.1.1/0, 1 links *Oct 20 17:44:30.675: Link 0, int 9, nbr 2.2.2.2, nbr int 4, type 2, cost 1 *Oct 20 17:44:30.675: Add better path, link 9/4, dist 1 *Oct 20 17:44:30.675: OSPFv3: putting LSA on the clist LSID 0.0.0.4, Type 0x2002, Adv Rtr. 2.2.2.2 *Oct 20 17:44:30.675: Add path FastEthernet0/0.22/::, distance 1 [Truncated Output] 417 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct type *Oct *Oct type *Oct *Oct type 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2 20 20 2 20 20 2 17:44:30.679: OSPFv3: Inter-Area SPF, Area 0 17:44:30.683: IAP LSA 2.2.2.2/0, age 330, seq 0x80000005 (Area 0) 17:44:30.683: prefix FC00::10/128 17:44:30.683: adding path FastEthernet0/0.22/FE80::2222 17:44:30.683: IAP LSA 2.2.2.2/1, age 330, seq 0x80000005 (Area 0) 17:44:30.683: prefix FC00::20/128 17:44:30.683: adding path FastEthernet0/0.22/FE80::2222 17:44:30.683: IAP LSA 2.2.2.2/2, age 330, seq 0x80000005 (Area 0) 17:44:30.683: prefix FC00::30/128 17:44:30.683: adding path FastEthernet0/0.22/FE80::2222 17:44:30.683: Adding deferred prefixes, Area 0 17:44:30.683: prefix FC00::30/128 17:44:30.687: send IAP FC00::30/128, metric 1 to Area 1 17:44:30.687: prefix FC00::20/128 17:44:30.687: send IAP FC00::20/128, metric 1 to Area 1 17:44:30.687: prefix FC00::10/128 17:44:30.687: send IAP FC00::10/128, metric 1 to Area 1 17:44:30.687: OSPFv3: External SPF Type 4005 17:44:30.687: ASE LSA 2.2.2.2/0, age 330, seq 0x80000003, metric 20, 17:44:30.687: 17:44:30.687: adding path FastEthernet0/0.22/FE80::2222 ASE LSA 2.2.2.2/1, age 330, seq 0x80000003, metric 20, 17:44:30.687: 17:44:30.687: adding path FastEthernet0/0.22/FE80::2222 ASE LSA 2.2.2.2/2, age 330, seq 0x80000003, metric 20, ... [Truncated Output] Using the same debugging commands, the following illustrates the impact of issuing the clear ipv6 ospf process command on the same router: R1#debug ipv6 ospf events OSPFv3 events debugging is on R1#debug ipv6 ospf spf OSPFv3 spf intra events debugging is on OSPFv3 spf inter events debugging is on OSPFv3 spf external events debugging is on R1#clear ipv6 ospf process Reset ALL OSPF processes? [no]: yes R1# *Oct 20 17:51:00.711: OSPFv3: Flushing External Links *Oct 20 17:51:00.711: Insert LSA 3 adv_rtr 1.1.1.1, *Oct 20 17:51:00.711: Insert LSA 4 adv_rtr 1.1.1.1, *Oct 20 17:51:00.715: Insert LSA 5 adv_rtr 1.1.1.1, *Oct 20 17:51:00.715: Insert LSA 6 adv_rtr 1.1.1.1, *Oct 20 17:51:00.715: Insert LSA 7 adv_rtr 1.1.1.1, *Oct 20 17:51:00.787: OSPFv3: Flushing Link States in 418 type type type type type Area 0x4005 0x4005 0x4005 0x4005 0x4005 0 in in in in in maxage maxage maxage maxage maxage C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y *Oct 20 17:51:00.787: *Oct 20 17:51:00.787: Insert LSA 0 adv_rtr 1.1.1.1, type 0x2001 in maxage Insert LSA 9 adv_rtr 1.1.1.1, type 0x8 in maxage [Truncated Output] *Oct 20 11:51:00.859 CST: %OSPFv3-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0.22 from FULL to DOWN, Neighbor Down: Interface down or detached [Truncated Output] *Oct 20 17:51:00.951: OSPFv3: DR/BDR election on FastEthernet0/0.22 *Oct 20 17:51:00.951: OSPFv3: Elect BDR 1.1.1.1 *Oct 20 17:51:00.951: OSPFv3: Elect DR 2.2.2.2 [Truncated Output] *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct *Oct 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 17:51:00.955: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.959: 17:51:00.963: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPF: OSPFv3: OSPFv3: OSPFv3: OSPFv3: OSPFv3: running SPF for Area 1, cause malloc clist, size 0, address Intra-Area SPF (Full), Area 1 No own Router LSA Check VLs running SPF for Area 0, cause malloc clist, size 0, address Intra-Area SPF (Full), Area 0 No own Router LSA Check VLs ospf_gen_asbr_sum_all_areas Inter-Area SPF, Area 0 Inter-Area SPF, Area 1 External SPF Type 4005 External SPF Type 2007 External SPF Type 2007 R N SN SA L 859C4988 R N SN SA L 859C4988 [Truncated Output] *Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/0 type 2003 *Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/1 type 2003 *Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/2 type 2003 *Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/0 type 4005 *Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/1 type 4005 *Oct 20 17:51:00.983: OSPFv3: Schedule partial SPF - 2.2.2.2/2 type 4005 *Oct 20 17:51:00.983: OSPFv3: Schedule partial SPF - 2.2.2.2/3 type 4005 *Oct 20 17:51:00.983: OSPFv3: Synchronized with 2.2.2.2 on FastEthernet0/0.22, state FULL *Oct 20 11:51:00.983 CST: %OSPFv3-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0.22 from LOADING to FULL, Loading Done *Oct 20 17:51:00.983: OSPFv3: Service partial SPF Type3/4:3 Type5:4 Type7:0 *Oct 20 17:51:00.983: OSPFv3: Partial IAP SPF, area 0, Prefix FC00::10/128 ... [Truncated Output] 419 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In essence, the clear ipv6 ospf force-spf command has less impact on OSPFv3 than the clear ipv6 ospf process command because it starts the SPF algorithm without clearing the OSPF database. Clearing the database on a core router in the network can have a drastic impact on the network. For this reason, the clear ipv6 ospf process command should be used with caution. The same is also applicable to the clear ip ospf process command. Therefore, consider using the clear ip ospf force-spf command instead. Finally, the redistribution keyword clears OSPFv3 redistribution. This is used when you are troubleshooting route redistribution into OSPF from other routing protocols. This command flushes and reinstalls all external LSAs; however, when this command is issued, the SPF algorithm is not run again. In conclusion, akin to using the debug ip ospf command to view real-time OSPFv2 events, you can use the debug ipv6 ospf command to view real-time OSPFv3 events. This command supports the same options as the debug ip ospf command. While additional OSPFv3-specific keywords, such as ipsec (for IPSec event debugging) are included, most of these options are similar to those used with OSPFv2 and perform the same function. The following shows the available options that can be included when you are debugging OSPFv3: R1#debug ipv6 ospf ? adj OSPF database-timer OSPF events OSPF flood OSPF hello OSPF ipsec OSPF lsa-generation OSPF lsdb OSPF packet OSPF retransmission OSPF spf OSPF adjacency events database timer events flooding hello events ipsec events lsa generation database modifications packets retransmission events spf TROUBLESHOOTING IPV6 ROUTE REDISTRIBUTION When troubleshooting IPv6 route redistribution you should follow the same processes used for IPv4 route redistribution. These include verifying device configurations and checking appropriate filters used during redistribution, if applicable. In addition to these rules, it is also important to remember default routing protocol operation. For example, neither RIPng nor EIGRPv6 will redistribute routes from OSPFv3 if the seed metric is not specified or if the metric is not included in the redistribution configuration. Therefore, instead of focusing on exclusive scenarios pertaining to IPv6 redistribution, remember the following points when both implementing and troubleshooting IPv6 route redistribution: 420 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y • With RIPng, if multiple instances are configured on the same router, by default, these instances will share routing information with each other (i.e., with no explicit redistribution configuration) if they use the same default Multicast group and UDP port. This default behavior can be changed by specifying different UDP port numbers for RIPng instances using the port [number] multicast-group [address] router configuration command when configuring the RIPng instances. • By default, when redistributing between dynamic routing protocols, connected routes are not automatically included in redistribution for IPv6. Instead, the include-connected router configuration command must be appended to the end of the redistribute [protocol] router configuration command. • When redistributing between IPv6 routing protocols, any route filtering implemented during redistribution (i.e., distribute lists and route maps) must reference either IPv6 ACLs or IPv6 prefix lists. • When redistributing IPv6 prefixes into OSPFv3, the subnets keyword is not required because IPv6 does not use Classful networks as IPv4 does. • By default, all local routes (i.e., routes marked with an ‘L’ in the RIB) are not included in the route redistribution for any IPv6 routing protocol. • Only IPv6 global addresses are redistributed. Link-Local addresses, which are used as the next-hop address for IPv6 IGPs, cannot be redistributed. IPV4 AND IPV6 INTEROPERABILITY Dual-stack implementations are those where internetwork devices and hosts use both protocol stacks (i.e., IPv4 and IPv6) at the same time. Dual-stack implementation allows the hosts to use either IPv4 or IPv6 to establish end-to-end IP sessions with other hosts. Dual-stack implementation does not automatically mean that the IPv4-only and IPv6-only hosts have the ability to communicate with each other. To do so, additional protocols and mechanisms are needed. Dual-stack simply means that the hosts (and infrastructure) are able to support both the IPv4 protocol stack and the IPv6 protocol stack. In situations where dual-stack implementations cannot be used, it is possible to tunnel the IPv6 packets over IPv4 networks. In these implementations, tunnels are used to encapsulate IPv6 pack- 421 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I ets in IPv4 packets, allowing them to be sent across portions of the network that do not, or do not yet natively, support IPv6. This allows the IPv6 ‘islands’ to communicate over the underlying IPv4 infrastructure. IPv4 and IPv6 integration and co-existence strategies are divided into three broad classes as follows: 1. Dual-Stack implementations 2. Tunneling 3. Protocol translation Cisco IOS software provides dual-stack support for several applications, tools, and protocols, which include, but are not limited to, the following: • Telnet • SSH • TFTP • Traceroute utility • HTTP • Frame Relay • FHRP • DNS • Ping utility Cisco IOS software also supports the following tunneling mechanisms, which can be used to transport IPv6 packets over native IPv4 networks: • Static (manually configured) IPv6 tunneling • 6to4 tunneling • Automatic IPv4-compatible tunneling • ISATAP tunneling • Generic Routing Encapsulation tunneling Static IPv6-in-IPv4 tunneling requires the static configuration of tunnels on dual-stack devices in order to allow IPv6 packets to be tunneled across the IPv4 network. When implementing static or manual tunnels, while the Tunnel interface itself is assigned an IPv6 address, both the tunnel source and destination addresses must be IPv4 addresses. Finally, when implementing manual tunnels, IPv6 packets are encapsulated in IPv4 packets by specifying a tunnel mode of ipv6ip under the Tunnel interface. 422 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y 6to4 tunnels are defined in RFC 3056 and are designed to allow IPv6 end sites to access the IPv6 backbone, commonly referred to as the 6bone, by tunneling across the IPv4 Internet. 6to4 automatic tunneling provides a dynamic method to deploy tunnels between IPv6 sites over IPv4 networks. Unlike with manually configured tunnels, there is no need to configure tunnel source and destination addresses manually to establish the tunnels. Instead, the tunneling of IPv6 packets between 6to4 sites is performed dynamically based on the destination IPv6 address of the packets originated by IPv6 hosts. Automatic prefix assignment provides a global aggregatable IPv6 prefix to each 6to4 site. This prefix is based on the 2002::/16 (0x2002) prefix assigned by IANA for 6to4 sites. The tunnel endpoint or destination is determined by the globally unique IPv4 address embedded in a 6to4 address. This address must be an address that is globally routable. In other words, RFC 1918 cannot be used for 6to4 tunnels because they are not unique. This 32-bit IPv4 address is converted to Hexadecimal notation and the final representation is a 48-bit prefix. For example, if the IP address 1.1.1.1 was embedded into the IPv6 6to4 prefix, the final representation would be the 2002:0101:0101::/48 6to4 address. Automatic IPv4-compatible tunnels enable IPv6 hosts to enable tunnels automatically to other IPv6 hosts across an IPv4 network infrastructure. Unlike 6to4 tunneling, automatic IPv4-compatible tunneling does use the IPv4-compatible IPv6 addresses. Automatic IPv4-compatible tunnels use the IPv6 prefix ::/96. To complete the 128-bit IPv6 address, the low-order 32 bits are derived from the IPv4 address. These low-order 32 bits of the source and destination IPv6 addresses represent the source and destination IPv4 addresses of the tunnel endpoints in the same manner as the 6to4 tunnels, which were described in the previous section. Therefore, with automatic IPv4-compatible tunneling, the host or router at each end of an IPv4-compatible tunnel must support both the IPv4 and IPv6 protocol stacks. Automatic IPv4-compatible tunneling requires the use of either MP-BGP or static routes, with the former method being the more preferred method for scalability reasons. This tunneling mechanism requires the use of IPv4-compatible IPv6 addresses. As was stated earlier in this chapter, these addresses have the first 96 bits set to 0 and are then followed by the 32-bit IPv4 address. An example of an IPv4-compatible IPv6 address would be 0000:0000:0000:0000:0000:0000:172.1 6.255.1. This same address can then be compressed as 0:0:0:0:0:0:172.16.255.1/128 or simply as ::172.16.255.1/128. Additionally, it is also important to remember that the decimal IPv4 address could be converted to Hexadecimal notation and used to create the IPv4-compatible IPv6 address 0:0:0:0:0:0: AC10:FF01/128 or simply ::AC10:FF01/128. Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) is an automatic overlay tunneling mechanism that uses the underlying IPv4 network as an NBMA Link Layer for IPv6. As the name 423 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I suggests, ISATAP is designed for transporting IPv6 packets within a site where a native IPv6 infrastructure is not yet available. ISATAP tunnels allow individual IPv4 or IPv6 dual-stack hosts within a site to communicate with other such hosts on the same virtual link, creating a virtual IPv6 network using the IPv4 infrastructure. The main functionalities and components of ISATAP are automatic tunneling, the ISATAP address format, prefixes, the interface ID, and ISATAP prefix advertisement. ISATAP addresses assigned to ISATAP routers and hosts are created using the concatenation of an IPv6 global Unicast address dedicated to the ISATAP operation and the special format of the interface ID. The ISATAP prefix represents the high-order 64 bits of the IPv6 address. A single ISATAP address is enabled on the ISATAP host using the Link-Local prefix FE80::/10 and another global or Site-Local 64-bit prefix is assigned for ISATAP operation within the site. This prefix is then received by ISATAP hosts from router advertisement messages sent by ISATAP routers through the ISATAP tunnels established over the IPv4 infrastructure. The interface ID used in ISATAP represents the low-order 64 bits of the IPv6 address assigned to the ISATAP host. ISATAP embeds IPv4 addresses in IPv6 addresses, in the same manner used in 6to4 tunneling. This interface ID is created by appending the 32-bit IPv4 address to the high-order 32-bit value 0000:5EFE. This value has been exclusively reserved by IANA only for ISATAP use. Finally, Generic Routing Encapsulation (GRE) is a tunnel encapsulation protocol that is used to tunnel protocols over an internetwork. GRE is the default encapsulation protocol used on Tunnel interfaces in Cisco IOS software, if one is not explicitly configured. GRE supports multiple protocols and can be used to encapsulate and transport protocols such as IPX, AppleTalk, and IPv6-in-IPv4 packets. This capability allows GRE to provide greater flexibility than other tunneling mechanisms. In a manner similar to manually configured IPv6-in-IPv4 tunnels, GRE tunnels are configured statically between two routers to allow for the transport of IPv6 packets over an IPv4 infrastructure. The only notable difference is that while IPv6-in-IPv4 tunnels use a tunnel mode of ipv6ip, GRE tunnels use a tunnel mode of gre ip to tunnel IPv6 packets over the IPv4 infrastructure using GRE encapsulation. NOTE: Additional detailed information and configuration examples of the tunneling methods described in the previous section can be found in the current ROUTE study guide available online. 424 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y TROUBLESHOOTING IPV4 AND IPV6 INTEROPERABILITY For the most part, all of the IPv4 and IPv6 interoperability issues are due to misconfigurations; however, in some rare cases, software and hardware defects or bugs may cause problems with some of the mechanisms described in the previous section. Prior to jumping to such conclusions, however, first check and then double-check your configurations. After checking and double-checking your configurations, if you are still convinced that the desired mechanism has been configured correctly, contact the TAC for further assistance in your troubleshooting effort. Some common problems encountered with the tunneling mechanisms described in the previous section include the following: • The configured Tunnel interface will not come up • The tunnel is up but you cannot ping across it • Routing adjacencies won’t establish across the tunnel There are several reasons why a Tunnel interface may not come up, which include the following: • Layer 1/Layer 2 issues • Layer 3 issues • Misconfigurations Layer 1/Layer 2 issues will prevent the Tunnel interface from transitioning to the up/up state. Following your tunnel configuration, if you notice that the line protocol reflects an up/down state, check the status of the interface specified as the tunnel source using the show interfaces command. In most cases, this state is reflective of this source interface being shutdown (administratively disabled) or down itself, meaning that there are Layer 1 or Layer 2 issues preventing it from coming up. Troubleshoot the issues using the relevant commands. In order for the Tunnel interface to come up, the specified tunnel destination must be present in the RIB of the local router. Consider the following output for example: R2#show interfaces Tunnel0 Tunnel0 is up, line protocol is down Hardware is Tunnel MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation TUNNEL, Loopback not set Keepalive not set Tunnel source 150.1.1.2, destination 160.1.1.1 Tunnel protocol/transport IPv6/IP Key disabled, sequencing disabled Checksumming of packets disabled ... [Truncated Output] 425 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I After having verified that the tunnel source interface is up on the local router, next verify that the tunnel destination is known to the local router using the show ip route command as follows: R2#show ip route 160.1.1.1 % Network not in table In the output above, the specified tunnel destination address is not known to the local router, hence, the reason for the tunnel being in an up/down state. Depending on your routing configuration, add a static route to the tunnel destination or verify dynamic routing protocol configuration to ensure that this address is not being incorrectly filtered, for example. In our example, we will simply assume that the tunnel destination will be reachable via a static route pointing out to the Serial0/0 interface of the local router and add the following configuration: R2(config)#ip route 160.1.1.1 255.255.255.255 Serial0/0 Following this configuration, and having verified connectivity to the tunnel destination address, the Tunnel interface transitions to the up/up state as follows: R2#show interfaces Tunnel0 Tunnel0 is up, line protocol is up Hardware is Tunnel MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec, reliability 255/255, txload 28/255, rxload 1/255 Encapsulation TUNNEL, Loopback not set Keepalive not set Tunnel source 150.1.1.2, destination 160.1.1.1 Tunnel protocol/transport IPv6/IP Key disabled, sequencing disabled Checksumming of packets disabled Tunnel TTL 255 ... [Truncated Output] Common configuration mistakes when implementing tunneling range from using or specifying the incorrect tunnel source and/or destination addresses to specifying the incorrect encapsulation type for the tunnel. For example, when configuring a manual tunnel, if you specify the tunnel mode as IPv6, meaning that the tunnel will encapsulate using IPv6 packets, the tunnel will not come up if the tunnel source and tunnel destination addresses are IPv4 addresses as illustrated in the following output: R1#show interfaces Tunnel0 Tunnel0 is up, line protocol is down Hardware is Tunnel 426 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation TUNNEL, Loopback not set Keepalive not set Tunnel source 160.1.1.2, destination 150.1.1.2 Tunnel protocol/transport IPv6 Tunnel TTL 255 ... [Truncated Output] When specifying IPv4 addresses as the tunnel source and tunnel destination addresses, you are configuring an IPv4 tunnel. You must therefore specify that the IPv6 packets are encapsulated using IPv4 packets by issuing the tunnel mode ipv6ip interface configuration command. The tunnel mode ipv6 command would be used to tunnel IPv4 packets in IPv6 packets, which you would do if you were configuring an IPv6 tunnel instead (i.e., the specified tunnel source and tunnel destination addresses were IPv6 address). The same applies to the other tunneling mechanisms that are also described in this guide. The most common reasons for a Tunnel interface being up but you are unable to ping across it are due to basic misconfigurations, such as incorrectly typing in the tunnel address, and filtering. When routers are connected to public networks, such as the Internet, it is common practice to implement ACL filtering to protect both the device and network from unauthorized access. When implementing tunnels between two such routers, it is important to ensure that the specified encapsulation protocol is permitted between the two host addresses (i.e., between the tunnel source and tunnel destination addresses). Another common reason for being unable to ping across Tunnel interfaces is due to mismatched encapsulations on the Tunnel interfaces. By default, Tunnel interfaces default to GRE encapsulation in Cisco IOS software. If you specify a non-default encapsulation type, this must be the same on both endpoints. For example, assume that a simple tunnel is configured between two routers named R1 and R2. The tunnel configuration on R1 is as follows: R1#show running-config interface Tunnel0 Building configuration... Current configuration : 145 bytes ! interface Tunnel0 no ip address ipv6 address 3FF3:ABCD::1/64 tunnel source 160.1.1.2 tunnel destination 150.1.1.2 tunnel mode ipv6ip end 427 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I On the remote endpoint, R2, the tunnel configuration has been implemented as follows: R2#show running-config interface Tunnel0 Building configuration... Current configuration : 125 bytes ! interface Tunnel0 no ip address ipv6 address 3FF3:ABCD::2/64 tunnel source 150.1.1.2 tunnel destination 160.1.1.2 end For verification purposes, a simple ping between the tunnel source and destination addresses will be used to validate end-to-end connectivity between the two as follows: R2#ping 160.1.1.2 source 150.1.1.2 repeat 10 Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to 160.1.1.2, timeout is 2 seconds: Packet sent with a source address of 150.1.1.2 !!!!!!!!!! Success rate is 100 percent (10/10), round-trip min/avg/max = 1/3/4 ms Because of the reachability between the tunnel endpoints, the Tunnel interfaces are in an up/up state as illustrated in the output of the show interfaces command on R2 as follows: R2#show interfaces Tunnel0 Tunnel0 is up, line protocol is up Hardware is Tunnel MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation TUNNEL, Loopback not set Keepalive not set Tunnel source 150.1.1.2, destination 160.1.1.2 Tunnel protocol/transport GRE/IP Key disabled, sequencing disabled Checksumming of packets disabled Tunnel TTL 255 However, despite the Tunnel interface state, you notice that the routers cannot ping one another as illustrated in the following output: R2#ping 3FF3:ABCD::1 repeat 10 Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to 3FF3:ABCD::1, timeout is 2 seconds: .......... Success rate is 0 percent (0/10) 428 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y Parsing through the configurations again, you notice that one endpoint (R1) is using a tunnel mode of ipv6ip, while the other endpoint (R2) is using the default, which is GRE. This encapsulation mismatch is the root cause of the problem. The recommended solution therefore would be to correct the configuration on R2 as follows: R2(config)#interface Tunnel0 R2(config-if)#tunnel mode ipv6ip R2(config-if)#exit Following this reconfiguration, you will be able to ping between the routers successfully as illustrated in the following output: R2#ping 3FF3:ABCD::1 repeat 10 Type escape sequence to abort. Sending 10, 100-byte ICMP Echos to 3FF3:ABCD::1, timeout is 2 seconds: !!!!!!!!!! Success rate is 100 percent (10/10), round-trip min/avg/max = 4/4/8 ms When you are troubleshooting routing protocol adjacency problems across Tunnel interfaces, you should first perform the Layer 1, 2, and 3 checks that are described in this section. Following this, validate routing protocol configuration on both endpoints, diligently checking for things such as incorrectly specified passive interfaces and mismatched parameters, for example. Given that these steps are described in this and previous chapters, the same will not be repeated again in this section to avoid being unnecessarily repetitive and redundant. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. IP version 6 Protocol Overview and Fundamentals • Version 6 of the Internet Protocol provides additional capabilities over the current version 4 • The additional capabilities included in IPv6 include the following: 1. The Simplified IPv6 Packet Header 2. Larger Address Space 3. IPv6 Addressing Hierarchy 4. IPv6 Extendibility 5. IPv6 Broadcast Elimination 6. Stateless Autoconfiguration 7. Integrated Mobility 8. Integrated Enhanced Security 429 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • The three ways in which IPv6 addresses can be represented are as follows: 1. The Preferred or Complete Address Representation or Form 2. The Compressed Representation 3. The IPv6 Addresses with an Embedded IPv4 Address • The two kinds of IPv6 addresses that contain an embedded IPv4 address are as follows: 1. IPv4-compatible IPv6 addresses 2. IPv4-mapped IPv6 addresses • IPv4-compatible IPv6 addresses have the first 96 bits of the address set to a value of 0 • IPv4-compatible IPv6 addresses complete the address using the 32-bit IPv4 address • IPv4-mapped IPv6 addresses have the first 80 bits set to 0 and the next 16 set to all 1s • IPv4-mapped IPv6 addresses complete the address using the 32-bit IPv4 address • IPv6 addresses can be classified as any one of the following: 1. Link-Local Addresses 2. Site-Local Addresses 3. Aggregate Global Unicast Addresses 4. Multicast Addresses 5. Anycast Addresses 6. Loopback Addresses 7. Unspecified Addresses Understanding and Troubleshoo ng EIGRPv6 • EIGRP for IPv6, also called EIGRPv6, is very similar to EIGRP for IPv4 or EIGRPv4 • The same core routing protocol operation is applicable to both versions, e.g. DUAL • The differences between EIGRPv4 and EIGRPv6 are listed in the following table: Protocol Characteristic EIGRP for IPv4 EIGRP for IPv6 Automatic Summarization Authentication or Security Common Subnet for Peers Advertisement Contents Packet Encapsulation Yes MD5 Yes Subnet/Mask IPv4 Not Applicable Built into IPv6 No Prefix/Length IPv6 • By default, when EIGRPv6 is implemented on a device, the protocol is in a shutdown state • EIGRPv6 requires an IPv4 address as the router ID • EIGRPv6 uses the link local address of the neighbor router(s) as the next-hop address 430 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y Understanding and Troubleshoo ng RIPng • RIP next generation (RIPng) is the successor of RIPv2 but is exclusively for the IPv6 protocol • The similarities and differences between RIPv2 and RIPng are listed in the following table: Protocol Characteristic RIPv2 RIPng Protocol Classification Hop Limitation Split Horizon Poison Reverse Transport Layer Protocol Multicast Updates Administrative Distance Hold-down Timers Destination Prefix Length Next Hop Length Next Hop Address Transport UDP Port Number Authentication Automatic Summarization Can Broadcast Updates Distance Vector 15 Yes Yes UDP Yes (224.0.0.9) 120 Yes 32-bit 32-bit Primary Interface Address IPv4 520 Text and MD5 Yes (enabled by default) Yes Distance Vector 15 Yes Yes UDP Yes (FF02::9) 120 Yes 128-bit 128-bit Link-Local Address IPv6 521 Inbuilt into IPv6 (IPsec) Not Applicable Not Applicable Understanding and Troubleshoo ng OSPFv3 • While similar in many ways, OSPFv2 and OSPFv3 have many differences which include the following: 1. Unlike OSPFv2, OSPFv3 runs over a link, negating the need to use network commands 2. OSPFv3 uses Link-Local addresses to identify the OSPFv3 adjacencies 3. OSPFv3 introduces two new OSPF LSA types, which are the Type 8 and Type 9 LSAs 4. OSPFv3 encapsulates messages (transport) using IPv6 datagrams 5. OSPFv3 uses IPv6 Multicast groups FF02::5 and FF02::6 and not IPv4 Multicast groups 6. OSPFv3 leverages the inbuilt capabilities of IPSec for security 7. The Options field in Hello Packets and DBD packets has been expanded to 24-bits 8. The OSPFv3 Hello packet contains no address information, but includes an Interface ID Troubleshoo ng IPv6 Route Redistribu on • For the most part, IPv6 route redistribution follows the same logic as IPv4 redistribution • However, there are some notable differences between the two, which include the following: 1. With RIPng, instances using the same port and Multicast group will share information 431 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 2. By default, connected routes are not automatically included in redistribution for IPv6 3. Route filtering must reference either IPv6 ACLs or IPv6 prefix lists 4. When redistributing IPv6 prefixes into OSPFv3, the subnets keyword is not required 5. By default, all local routes, are not included in the route redistribution 6. Only global IPv6 addresses are redistributed; Link-Local prefixes are not redistributed IPv4 and IPv6 Interoperability • With dual-stack implementations, hosts and network devices run both IPv6 and IPv4 • With tunneling mechanisms, IPv6 packets are tunneled in IPv4 packets • With protocol translation, IPv6-to-IPv4, and vice-versa, translation is implemented • Tunneling allows IPv6 packets to be encapsulated and sent over native IPv4 internetworks • The following tunneling mechanisms are supported in Cisco IOS software: 1. Static (Manually Configured) IPv6 Tunneling 2. 6to4 Tunneling 3. Automatic IPv4-compatible Tunneling 4. ISATAP Tunneling 5. Generic Routing Encapsulation Tunneling • Static tunneling requires the static configuration of tunnels on dual-stack devices • Static tunneling requires tunnel source and destination addresses to be specified • Static tunneling is enabled using the tunnel mode ipv6ip interface configuration command • 6to4 tunneling allows tunnels to be dynamically created • 6to4 tunneling requires no explicit tunnel destination to be configured • 6to4 tunneling has the following characteristics: 1. Automatic or Dynamic Tunneling 2. Automatic Prefix Assignment 3. There is no IPv6 Route Propagation • 6to4 tunneling uses the IPv6 2002::/16 prefix which was assigned by IANA for 6to4 sites • The tunnel mode ipv6ip 6to4 command is used to enable 6to4 tunneling • Automatic IPv4-compatible tunneling is also a dynamic tunneling mechanism • Automatic IPv4-compatible tunnels use the IPv6 prefix ::/96 • The tunnel mode ipv6ip auto-tunnel enables automatic IPv4-compatible tunneling • ISATAP is an automatic overlay tunneling mechanism • ISATAP uses the underlying IPv4 network as an NBMA Link Layer for IPv6 • ISATAP addressing includes the 0000:5EFE high-order 32-bit value • The tunnel mode ipv6ip isatap command is used to enable ISATAP tunneling • GRE is a tunnel encapsulation protocol that is used to tunnel protocols over an internetwork 432 C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y • GRE is the default encapsulation protocol used on tunnel interfaces • GRE provides much greater flexibility than the other different tunneling mechanisms • GRE tunnels use a tunnel mode of gre ip to tunnel IPv6 packets over IPv4 • While GRE and manual tunnels have similar configurations, there are some differences as follows: 1. Generic Routing Encapsulation tunnels have a default MTU value of 1476 bytes 2. Static IPv6-in-IPv4 tunnels have a default MTU value of 1480 bytes 3. For GRE tunnels, the Link-Local address is derived via the EIU-64 method 4. For static IPv6-in-IPv4 tunnels, the Link-Local address is derived from the FE80::/96 prefix 5. GRE supports multiple protocols 6. Static IPv6-in-IPv4 tunnels only support the encapsulation of IPv6 in IPv4 packets • Before implementing tunneling, the following factors should be taken into consideration: 1. Maximum Transmission Unit issues 2. ICMPv4 Error Messages 3. Protocol Filtering 4. Network Address Translation • The most common reasons for tunneling issues are device misconfigurations • Additional reasons include Layer 1, 2, and 3 issues • In rare cases, hardware and software faults or bugs can cause tunneling issues 433 CHAPTER 11 Troubleshoo ng Cisco Wireless LAN Solu ons C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I W ireless LANs (WLANs) provide network connectivity almost anywhere and at much less cost than traditional wired LANs. For this reason, WLAN solutions are commonplace in most business environments. In addition to implementing and troubleshooting wired LAN solutions, you are also expected to understand how to implement and troubleshoot Cisco Unified WLAN solutions. While WLANs are described in detail in the SWITCH guide, this chapter will also describe some WLAN concepts with which you should be familiar. However, primary emphasis will be placed on WLAN troubleshooting and problem resolution. The TSHOOT certification exam objective that is covered in this chapter is as follows: • Troubleshoot switch support of advanced services (i.e., Wireless, VOIP, and Video) This chapter will be divided into the following sections: • Wireless Local Area Network Overview • The Cisco WLAN Solution • Troubleshooting Cisco WLAN Solutions WIRELESS LOCAL AREA NETWORK OVERVIEW Wireless networks use radio waves to transmit data and connect devices to the Internet, as well as to other networks and applications, which minimizes the need for wired connections. Although a wireless network allows users to access network resources ‘over-the-air,’ it is important to keep in mind that wireless traffic also traverses the physical wired infrastructure. Therefore, it is imperative to remember that Wireless Local Area Networks are meant to augment, rather than replace, the wired LAN campus infrastructure. This augmentation allows for a flexible data communication system within the enterprise network. IEEE 802.11 Components The 802.11 architecture is comprised of several logical and physical components. The 802.11 components described in this section are as follows: • Client or Station (STA) • Access Point (AP) • Independent Basic Service Set (IBSS) • Basic Service Set (BSS) • Extended Service Set (ESS) • Distribution System (DS) 436 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S The client or station (STA) refers to any appliance that interfaces with the wireless medium and operates as an end-user device. The STA contains an adapter card, a PC card, or an embedded device to provide wireless connectivity. Some common examples of STAs include laptop computers, desktop computers, and PDAs with wireless network interface cards. The wireless Access Point (AP) functions as a bridge between the wireless STAs and the existing network backbone for network access. APs serve as the central points in an all-wireless network, or as the connection point between wired and wireless networks. When APs are used in a wireless network, any STA attempting to use the wireless network must first establish membership, or an association, with the AP. An Independent Basic Service Set (IBSS) is a wireless network, consisting of at least two STAs, used where no access to a Distribution System is available. The Distribution System (DS) will be described later in this section. An IBSS is sometimes referred to as an independent configuration or as an ad hoc wireless network. From a logical perspective, an IBSS is very similar to a peer-to-peer network in which no one node performs any server functions. The 802.11 WLAN infrastructure architecture is based on a cellular architecture that divides the system into cells, referred to as a Basic Service Set (BSS). The BSS is controlled by a Base Station, or, more commonly, an AP. The cell is restricted to the AP’s coverage area. Clients, or stations, within the cell can then associate themselves with the AP, allowing them to use the wireless LAN. Access Points may be interconnected using the switched network, creating what is referred to as an Extended Service Set (ESS). The ESS is comprised of overlapping BSS sets (cells) that are usually connected together by a wired medium (Distribution System). In most cases, the ESS allows stations to roam. Roaming is the process of moving from one cell (BSS) to another without losing the wireless connection. Finally, the Distribution System (DS) allows for the interconnection of the APs of multiple cells (BSSs). This allows for mobility because STAs can move from one BSS to another BSS. Although the Distribution System could be any type of network, it is almost always a wired Ethernet LAN. However, it should be noted that it is also possible for APs to be interconnected without using wires. The three types of DSs are integrated, wired, and wireless. IEEE 802.11 Frames WLANs are defined in the IEEE 802.11 standards. The IEEE 802 standards define two separate layers for the Data Link layer (Layer 2) of the OSI Reference Model. These two layers are the Logical Link Control (LLC) and Media Access Control (MAC) sublayers. The IEEE 802.11 standards cover 437 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I the operation of the MAC sublayer and the physical layer of the OSI Reference Model. The 802.11 frame consists of a 32-byte MAC header, a variable length body between 0 and 2312 bytes, and a 4-byte FCS. The 802.11 standard uses the following three types of frames: 1. Control frames 2. Management frames 3. Data frames 802.11 uses control frames to control device access to the wireless medium. These control frames include the Ready (Request) To Send (RTS), Clear To Send (CTS), and Acknowledgement (ACK) frames. The RTS/CTS function is optional and is employed to prevent frame collisions. After receiving a data frame, the receiving STA will utilize an error, checking processes to detect the presence of errors. The receiving STA will send an ACK frame to the sending STA if no errors are found. Receipt of the acknowledgment tells the original sender STA of the frame that no collisions occurred. However, if the sending STA does not receive an ACK after a period of time, it assumes a collision occurred and will retransmit the frame. 802.11 management frames enable stations to establish and maintain communications. There are several management frame subtypes, which include Beacon Frames (Beacons), Association Request Frames, and Authentication Response Frames, among several other frames. While not included in this chapter, additional material and detail on the other frame types may be found in the current SWITCH study guide, which is available online at www.howtonetwork.net. Figure 11-1 below illustrates the exchange of management frames between an Access Point (AP) and a client (STA) using passive scanning to synchronize itself with the AP: Beacon Frame Authentication Request STA Authentication Response AP Association Request Association Response Fig. 11-1. STA Association with an AP When Using Passive Scanning 438 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S Figure 11-2 below illustrates the exchange of management frames between an Access Point (AP) and a client (STA) using active scanning to synchronize itself with the AP: Probe Request Probe Response STA Authentication Request Authentication Response AP Association Request Association Response Fig. 11-2. STA Association with an AP When Using Active Scanning Finally, data frames are sent by any STA and contain higher layer protocol information or data. As is the case with management frames, the 802.11 standard supports several data frame types. Additional information on the 802.11 standard data frames may be found in the current SWITCH study guide, which is available online. IEEE 802.11 Standards At the physical (PHY) layer, IEEE 802.11 defines a series of encoding and transmission schemes for wireless communications, the most common of which are the Frequency Hopping Spread Spectrum (FHSS), Direct Sequence Spread Spectrum (DSSS), and Orthogonal Frequency Division Multiplexing (OFDM) transmission schemes. Although Infra Red (IR) also exists at this layer, very little development of this standard has occurred due to line-of-sight limitations. The 802.11 standards described in this section are as follows: • IEEE 802.11 (original) • IEEE 802.11b • IEEE 802.11a • IEEE 802.11g • IEEE 802.11n The original IEEE 802.11 standard defined WLANs that provided up to 2 Mbps of throughput. The original standard specified the FHSS and DHSS transmission schemes and the S-Band Industrial, 439 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Scientific, and Medical (ISM) frequency band, which operates in the frequency range of 2.4 to 2.5 GHz. The original 802.11 standard is also sometimes referred to as IEEE 802.11 legacy and is now considered obsolete because the throughput is too slow for most applications. The 802.11b standard is an extension of 802.11 that operates in the same unregulated 2.4 GHz band as the original 802.11 standard. Devices operating on 802.11b use DSSS modulation for higher speeds. The data rates on a channel can vary according to client capabilities and conditions. However, the only possible data rates are 1, 2, 5.5, and 11 Mbps. The 5.5 Mbps and 11 Mbps are two new speeds added to the original specification. The 2.4-GHz band consists of 14 channels, each 22 MHz wide. In North America, the Federal Communications Commission (FCC) allows channels 1 through 11. Most of Europe can use channels 1 through 13. In Japan, only channel 14 is used. APs or clients use a spectral mask or a template to filter out a single channel based around a center frequency. The IEEE 802.11a standard is an extension of 802.11 that applies to wireless LANs and provides up to 54 Mbps. This standard uses OFDM and does away with spread spectrum. As a result, it is not compatible with 802.11b or 802.11g and therefore this standard is seldom used any more. 802.11a equipment operates at 5GHz. This higher frequency range means that 802.11 signals are absorbed more readily by walls and other solid objects in their path due to their smaller wavelength and, as a result, they cannot penetrate as far as those of 802.11b. The FCC has allocated 300MHz of RF spectrum for unlicensed operation in the 5GHz block referred to as the Unlicensed National Information Infrastructure (U-NII) band. The 802.11g standard also works in the same 2.4GHz range as 802.11b. IEEE 802.11g operates at a bit rate as high as 54 Mbps but uses the S-Band ISM and OFDM. However, unlike 802.11a, 802.11g is backward compatible with 802.11b, and can operate at the 802.11b bit rates and use DSSS. Like 802.11a, 802.11g uses 54 Mbps in ideal conditions and the slower speeds of 48 Mbps, 36 Mbps, 24 Mbps, 18 Mbps, 12 Mbps, and 6 Mbps in less-than-ideal conditions. IEEE 802.11n improves on 802.11a and 802.11g maximum data rate, with a significant increase in the rate from 54 Mbps to approximately 600 Mbps. The 802.11n standard includes several enhancements to the previously described 802.11 standards. The enhancements include MIMO, 40-MHz operation, frame aggregation at the MAC sublayer, and backward compatibility, which makes it possible for multiple 802.11 a, 802.11b, 802.11g, and 802.11n devices to coexist. 440 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S THE CISCO WLAN SOLUTION The Cisco Wireless LAN solution is designed to provide IEEE 802.11 wireless networking solutions for both enterprises and service providers. It consists of Cisco Wireless LAN Controllers (WLCs) and their associated Lightweight Access Points (LAPs). WLCs work in conjunction with Cisco Access Points as well as the Cisco Wireless Control System (WCS) to support business-critical wireless applications. WLCs are responsible for system-wide wireless LAN functions, such as the following: • Integrated Intrusion Prevention System (IPS) • Zero-touch deployment of Lightweight Access Points (LAPs) • Real-time Radio Frequency (RF) management • Wireless LAN redundancy • Dynamic channel assignment for each LAP • Dynamic client load balancing across LAPs • Dynamic LAP transmit power optimization • Wireless LAN security management WLCs communicate with controller-based APs over any Layer 2 (Ethernet) or Layer 3 (IP) infrastructure using the Lightweight Access Point Protocol (LWAPP). LWAPP is an IETF draft protocol. An LAP discovers a controller with the use of LWAPP discovery mechanisms. The LAP sends an LWAPP join request to the WLC and the controller sends the LAP an LWAPP join response, which allows the AP to join the controller. When using LWAPP, although the LAP is under the control of the centralized WLC, the actual processing of data and management protocols and AP capabilities is divided between the LAP and the centralized WLC (the split-MAC architecture). NOTE: In controller software release 5.2 or later, Cisco LAPs use the IETF standard Control and Provisioning of Wireless Access Points (CAPWAP) protocol in order to communicate between the controller and other LAPs on the network. Controller software releases prior to 5.2 use the LWAPP for these communications. CAPWAP, which is based on LWAPP, is a standard, interoperable protocol that enables a controller to manage a collection of wireless APs. LAPs can discover and join a CAPWAP controller. The one exception is for Layer 2 deployments, which are not supported by CAPWAP. Additionally, CAPWAP and LWAPP controllers may be deployed in the same network. The CAPWAP-enabled software allows APs to join a controller that runs either CAPWAP or LWAPP. 441 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I When the LAP joins to the controller, it downloads the controller software if the revisions on the LAP and the controller do not match. Following that, the LAP is completely under the control of the controller and is unable to function independently of the controller. LWAPP secures the control communication between the LAP and the controller by means of a secure key distribution. The secure key distribution requires already provisioned X.509 digital certificates on both the LAP and the controller. Factory-installed certificates are referenced with the term ‘MIC,’ which is an acronym for Manufacturing Installed Certificate. The LWAPP Discovery Process Despite the split-MAC architecture, it is important to remember that LAPs cannot act independently of the WLC. The WLC manages the LAP configurations and firmware. The LAPs are zerotouch deployed, meaning that there is no individual configuration of LAPs required when they are deployed into the WLAN. In order for the WLC to manage the LAP, the LAP should discover the controller and register with the WLC. After the LAP has registered to the WLC, LWAPP messages are exchanged and the AP initiates a firmware download from the WLC if there is a version mismatch between the AP and the WLC. This allows the LAP to sync with the WLC. Following the sync, the WLC provisions the LAP with the configurations that are specific to the WLANs so that the LAP can accept client associations. These WLAN-specific configurations include the Service Set Identifier (SSID), any additional required security parameters, and 802.11 parameters, such as the data rate, radio channels to use, and the power levels. The following sequence of events must occur in order for an LAP to register to a WLC: 1. The LAPs issue a Dynamic Host Configuration Protocol (DHCP) Discovery Request to get an IP address. This happens only if the LAP has not been configured with a static IP address. 2. The LAP sends LWAPP Discovery Request messages to the WLCs. If Layer 2 LWAPP mode is supported on the LAP, the LAP broadcasts an LWAPP Discovery message in a Layer 2 LWAPP frame. However, if the LAP or the WLC does not support Layer 2 LWAPP mode, the LAP attempts a Layer 3 LWAPP WLC discovery. The LAPs use the Layer 3 discovery algorithm only if the Layer 2 discovery method is not supported or if the Layer 2 discovery method fails. The LWAPP Layer 3 WLC discovery algorithm repeats until at least one WLC is found and joined. 3. Any available WLC that receives the LWAPP DHCP Discovery Request responds with an LWAPP Discovery Response. 442 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S 4. If the LAP receives more than one LWAPP Discovery Response, it selects the WLC to join, which is typically the first WLC to respond to the LAP. 5. The LAP then sends an LWAPP Join Request to the WLC and the WLC validates the LAP and then sends an LWAPP Join Response to the LAP. 6. The LAP validates the WLC, which then completes the Discovery and Join process. The LWAPP Join process includes mutual authentication and encryption key derivation, which is used to secure both the Join process and LWAPP Control messages between the LAP and the WLC. 7. The LAP registers with the WLC and can begin accepting client associations. Wireless LAN Roaming One of the most significant advantages of WLANs over wired LANs is roaming, or mobility. Roaming is a wireless LAN client’s ability to maintain its association seamlessly and securely from one AP to another, with as little latency as possible. When a wireless client associates and authenticates to an AP, the AP’s controller places an entry for that client in its client database. This entry includes the client’s MAC and IP addresses, security context and associations, Quality of Service (QoS) contexts, the WLAN, and the associated AP. The controller uses this information to forward frames and manage traffic to and from the wireless client. The Cisco WLAN supports three types of roaming, which are as follows: 1. Intra-controller roaming (same subnet) 2. Inter-controller roaming (same subnet) 3. Inter-subnet (Layer 3) roaming Intra-controller roaming occurs when a wireless client roams between APs that are joined to the same controller. In such cases, the controller simply updates the client database with the newly associated AP. If necessary, new security context and associations are established as well. Inter-controller roaming occurs when the client roams from an AP joined to one controller to an AP joined to a different controller. When the client associates to an AP joined to a new controller, the new controller exchanges mobility messages with the original controller, and the client database entry is moved to the new controller. New security context and associations are established, if necessary, and the client database entry is updated for the new AP. This process is transparent or invisible to the user and is facilitated by the exchange of mobility and packets between the WLCs. These packets are exchanged through EtherIP packets (IP protocol 97). 443 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Inter-subnet roaming is somewhat similar to inter-controller roaming, with some differences. With inter-subnet roaming, the wireless LAN interfaces of the WLCs are on different subnets. In addition, inter-subnet roaming does not move the client database entry to the new controller. Instead, the original controller marks the client with an anchor entry in its local database, and this is copied to the new controller client database and marked as a foreign entry. The client keeps its IP address and the entire process is transparent. TROUBLESHOOTING CISCO WLAN SOLUTIONS Like wired LAN solutions, WLAN solutions are comprised of many different elements. Therefore, in order to troubleshoot the WLAN solution, it is important to have a solid understanding of WLAN components and how they interact with each other in the overall solution. This understanding simplifies the overall WLAN solution troubleshooting process. Generally speaking, the majority of Wireless LAN issues fall into one of the following problem areas: • Wireless Station (STA) or client issues • WLC configuration issues • AP configuration issues • AP and WLC registration issues • Infrastructure issues • Antenna and radio frequency issues The sections that follow describe some common problems that may occur in these areas. Additionally, suggested solutions to correct (or avoid) such issues are also described. Wireless Sta on (STA) Issues Troubleshooting wireless clients (STAs) is an integral component of the overall WLAN solution troubleshooting process. This process can be used to narrow down the root of the wireless problem. For example, if a single client is unable to connect to the WLAN but every other client is able to connect, you can eliminate the WLAN solution devices (e.g., APs and WLCs) and troubleshoot the client itself. However, if multiple or all clients are unable to access the WLAN, you can eliminate the clients and focus your efforts on WLAN solution devices instead. Basic Wireless client troubleshooting should include the following tasks: • Checking the client wireless NIC state • Checking client settings • Checking the state of the wireless client 444 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S Checking the client wireless network connection includes tasks such as validating that the client wireless network card is enabled and is functioning properly. Some wireless devices have a radio button that can be toggled back and forth to enable and disable the wireless network connection. It is not uncommon for this to be accidentally disabled, for example, such as when one is removing a laptop out of a carrying case. In most cases, the Operating System will indicate that the radio button is disabled by displaying a warning or other error message or using some kind of visual indicator, such as a red do-not-enter symbol for example. Figure 11-3 below shows the warning message displayed on a Windows-based machine after the wireless radio is disabled: Fig. 11-3. Client Wireless Radio Warning Indicator and Message In addition to verifying whether the wireless network connection is enabled, it is also important to check whether the wireless network interface controller (NIC) has been installed correctly and is operating as expected (i.e., there are no error or warning messages printed by the Operating System). Additionally, it is also prudent to validate that the TCP/IP stack has been installed properly, is working, and that the client is correctly configured to receive IP addressing information dynamically using the DHCP service. Keep in mind that these checks will vary depending on platform. After validating that the wireless NIC is functioning as it should be, the next logical step would be to check the wireless settings. These include the SSID, security or authentication configuration (if applicable), and station configuration. Once the AP has been discovered, the client must establish an association. The AP may have some specific requirements that must be satisfied before allowing the STA to join the cell. For example, the AP may request a matching SSID, a supported 802.11 standard, or some form of authentication. It is important to ensure that these parameters match on both the client and the AP; otherwise, the client will not be able to establish an association with the AP. Checking the state of the client involves verifying whether the client is able to detect any wireless networks, based on the assumption that the wireless NIC is working as it should be. Additional client checks should include checking for signal interference, and verifying whether the client is associated. Different Operating Systems and vendors have tools that can be used to troubleshoot the client state. Check the vendor or manufacturer documentation for additional information on using their utilities. As an example, Figure 11-4 below shows the Dell Wireless WLAN Card Utility that can be used to troubleshoot client state issues on a machine installed with a Dell WLAN Card: 445 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Fig. 11-4. Client Wireless State Troubleshooting Tools NOTE: When troubleshooting client association, you should also check the station’s status on the AP. AP troubleshooting is described later in this section. WLC Configura on Issues As is the case with all of the other technologies and protocols described in this guide, device misconfigurations are a common cause of problems. Some common WLC device misconfigurations include the following: • Mismatched Service Set Identifiers • Security mismatches • WLANs are disabled on the WLC • Data rate mismatches • Incorrect client or station filtering • Unsupported features • IP address assignment issues • SSID Broadcast is disabled 446 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S NOTE: It should be noted that while the section that follows does include common WLC and AP misconfigurations, as well as recommendations for resolving such problems, you are not expected to perform any WLC troubleshooting in the current TSHOOT certification exam. Emphasis should instead simply be placed on understanding the potential root of the problem. When multiple clients are unable to connect to the WLAN, a good point to start is by checking the configured SSID on the WLC. Just as the SSID could be misconfigured on the client, it is possible for it also to be misconfigured on the WLC. When verifying the configured SSID, it is important to remember that the SSID is case sensitive. Security mismatches are another common cause of WLAN problems. These parameters must match on the client and the WLC. If the authentication type is Static WEP, verify that the appropriate encryption key and key index on the WLC matches that of the client. Alternatively, if the authentication type is 802.1x or WPA, ensure that the authentication type and the encryption key size match on both the client and the WLC. In the event that these parameters are indeed mismatched, they should be corrected on both the WLC and the client. When a WLAN is configured on the WLC, it is important to ensure that it is enabled. By default, the status of the WLAN is not enabled on the WLC. Instead, it must be enabled manually following its configuration. If the WLAN is disabled, clients will not be able to associate. Verify the configuration of the WLC to ensure that all configured WLANs are enabled. The Cisco Unified WLAN solution allows administrators to specify data rates for the AP radio. Data rates can be specified as either mandatory or supported. If a data rate is specified as mandatory, the client must support it, otherwise association will fail. If a certain data rate has been configured as mandatory, verify that it is supported by the wireless client. To avoid situations such as these, it is recommended that you set the lowest data rate on the WLC to mandatory and then specify any other data rates as supported. On the WLC, there is an option to disable the clients manually, which helps to prevent rogue clients from trying to access the network. While such policies enhance WLAN security, misconfigurations can result in legitimate clients being denied access to the WLAN. If your organization’s policy requires such security, check the WLC configuration to ensure that the client that cannot connect to the WLAN is not included in the list of filtered client MACs. Cisco WLCs include some proprietary features and functions that may not be supported by nonCisco clients. Common proprietary features that are enabled by default include radio preambles and Management Frame Protection (MFP). The radio preamble, which is sometimes called a head- 447 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I er, is a section of data at the head of a packet, which contains information that wireless devices need when they send and receive packets. MFP ensures the integrity of the 802.11 management frames by allowing the AP to add a Message Integrity Check Information Element (MIC IE) to each frame. Any attempt made by the intruders to copy, alter, or replay the frame invalidates the MIC, which causes any receiving AP, which is configured to detect MFP frames, to report the discrepancy. Although enabled by default, these features are not supported by all clients. When non-Cisco clients exist, it is recommended that such features be disabled if not supported. IP addressing issues are one of the most commonly experienced WLAN issues. The WLC can be configured as a DHCP relay agent or as a DHCP server itself. When the WLC operates as a DHCP relay agent, it forwards DHCP messages from clients to the specified DHCP server(s). When the DHCPOFFER comes back to the controller, it changes the DHCP server IP address to its virtual IP address, which is typically set to 1.1.1.1. When clients roam, the first thing they attempt to do is contact the DHCP server to renew their IP address. By using its own virtual IP address as that of the DHCP server, the WLC is able to intercept the client DHCPREQUEST packets. Given this behavior, it is important to ensure that all WLCs (if more than one exists) are configured with the same virtual IP address, as it prevents clients from beginning the entire DHCP process each time they roam between APs because they believe that they are communicating with the same DHCP server. With Cisco 4400 series WLCs, by default, the Broadcast SSID parameter is disabled. This is a problem for non-Cisco clients or other devices that perform only passive scans (i.e., those that do not transmit probe requests) to locate an AP. In hybrid or in non-Cisco client environments, you should enable the Broadcast SSID parameter so that any passive clients will be able to associate. This action also allows any clients that do not have an SSID explicitly configured to associate. AP Configura on Issues While the Cisco Unified WLAN Solution includes both WLCs and APs, it is also possible to implement a Cisco WLAN solution with just APs running in autonomous mode. In such implementations, all APs must be configured individually, and it is important to ensure that configuration parameters are consistent between the APs. The most common issues with APs that are running in autonomous mode are service interruptions when clients are roaming. When implementing APs in autonomous mode it is important to ensure that all of the APs are configured with the following parameters: 448 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S • The same Service Set Identifier • The same IP subnet • The same Layer 2 native VLAN When a client roams from one AP to another, the client will discard 802.11 probe responses and beacons received from APs unless they have matching SSID and encryption settings, which results in connectivity issues as the client moves from one AP to another. When roaming, WLAN clients first perform a Layer 2 roam as they move from one AP to another within the same subnet. However, if the APs are in a different subnet, clients perform Layer 3 roaming, which entails the client to acquire a new IP address, interrupting previously connected sessions. This behavior will adversely impact wireless VoIP phones and similar services, as well as any other applications using the original IP address. Finally, APs use the native VLAN to communicate information about clients that are roaming with other APs. For this reason, it is important to ensure that a consistent native VLAN is used when implementing multiple autonomous APs, as management traffic is sent and received across this VLAN. AP and WLC Registra on Issues In order to troubleshoot LAP and WLC integration issues effectively, it is important to have a solid understanding of the interaction between the LAP and the WLC, as was described earlier in this chapter. One of the most common issues that prevent LAP registration with the WLC is forgetting to configure DHCP Option 43. DHCP Option 43 is required to provide the LAP with the IP address(es) of the WLC in the DHCPOFFER message. If this option is not specified, the LAP is unable to register with the WLC. As stated earlier in this chapter, the LAP attempts to use the Layer 2 discovery method first and then reverts to the Layer 3 discovery method if the Layer 2 method is not supported or fails. When the LAP and the WLC reside on different subnets, the LAP uses Layer 3 discovery to locate a WLC. In this mode, the LAP will broadcast a Layer 3 LWAPP discover message on the local subnet. If the WLC resides on a remote subnet, the DHCP relay agent is required to ensure that these messages are relayed to the WLC. When using the Cisco IOS DHCP relay agent, it is important to remember that LWAPP discovery messages are not forwarded by default when the ip helperaddress <address> interface configuration command is issued. Given this, it is important to en- sure that LWAPP Broadcasts, which use UDP port 12223, are forwarded by the DHCP relay agent by adding the ip forward-protocol udp 12223 global configuration command to the Cisco IOS DHCP relay agent configuration file. Without this configuration command, the LAP will not be able to communicate with the remote WLC. 449 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Infrastructure Issues Infrastructure issues can also cause WLAN issues. As stated earlier in this chapter, the wireless LAN relies on the wired LAN for connectivity. For this reason, it is important to ensure that switches and other intermediate devices are adequately configured to support the WLAN extension. For example, when connecting a WLC to a switch, the port should be configured as a trunk link. Verify the switch port configuration by checking the configuration or by using the show interfaces <name> switchport command. Unlike WLC ports, the switch ports connected to LAPs should not be configured as trunk links, but as access ports. These ports should then be assigned to an active VLAN, which is typically the management VLAN. Again, verify switch configuration by checking the configuration or by using the show interfaces <name> switchport command. Additionally, ensure that PortFast is enabled on ports that are connected to LAPs so that these ports transition to the Forwarding state immediately. Infrastructure devices such as Multilayer switches can also be configured as DHCP servers or DHCP relay agents. When the device is configured as a DHCP server, it is important to verify that DHCP Option 43 is included in the Cisco IOS DHCP server configuration. This is implemented using the option 43 ascii “<address>” DHCP configuration command. Likewise, if the device is configured as a Cisco IOS DHCP relay agent, ensure that the ip forward-protocol udp 12223 global configuration command is used if the LAP and WLC reside on different IP subnets. Like Cisco IP phones, Cisco APs can use an external power source to draw their power, or they can draw their power from the switch to which they are connected. This power is sent within the Ethernet cable connecting the switch and the AP using either the IEEE 802.3af-2003 standard or the Cisco Inline Power method. Because Power over Ethernet (PoE) is increasingly used in today’s converged networks, it is important to calculate accurately the amount of power that will be required by the devices that will draw power from the switch. Cisco provides an online power calculator tool (requires login) that can be used to make this determination. If an AP is connected to a switch and is unable to draw sufficient power, perhaps because other devices connected to the switch are consuming all available power, or because the switch itself is unable to provide sufficient power to all devices due to incorrect power calculations by administrators, in some cases, a message similar to the following will be logged on the console: %CDP_PD-2-POWER_LOW: All radios disabled - LOW_POWER_CLASSIC inline This message means that the AP has detected that the switch port (PSE) is not able to provide sufficient power and therefore has transitioned to low-power mode. In low-power mode, the AP will disable all radios, effectively meaning that no stations or clients will be able to associate with that 450 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S AP. An important fact to remember is that if the AP is connected to both an external power injector and a PoE switch port, and if the switch is not able to provide the AP with sufficient power, it will still log this message and disable the radios, even though the external power injector can provide enough power. In essence, PoE information received via Cisco Discovery Protocol (CDP) takes precedence. In such cases, the AP must be configured to ignore the CDP information and use the external power injector. NOTE: AP configuration is beyond the scope of the TSHOOT certification exam and will not be described in any additional detail in this chapter or in the remainder of this guide. On the switch side, you can use the show power suite of commands to verify available power for the entire chassis, on a per-module or per-interface command, as shown below. As an example, the show power inline <interface> command shows how much power is drawn by the device con- nected to the switch port. The following example shows the power used by a connected IP phone: Cat-6500-1#show power inline GigabitEthernet2/1 Interface Admin Oper Power(Watts) Device Class From PS To Device --------- ------ ---------- ---------- ---------- ------------------- ----Gi2/33 auto on 13.5 12.0 Cisco IP Phone 7945 3 Interface AdminPowerMax (Watts) ---------- --------------Gi2/1 15.4 The following example shows the same output for a switch port connected to an AP. In this case, the AP is using an external power injector and is not drawing power from the switch, implying the AP has been appropriately configured to use the external power source: Cat-6500-1#show power inline GigabitEthernet2/2 Interface Admin Oper Power(Watts) Device Class From PS To Device --------- ------ ---------- ---------- ---------- ------------------- ----Gi2/2 auto off 0 0 Interface AdminPowerMax (Watts) ---------- --------------Gi2/2 15.4 451 cisco AIR-LAP1252AG n/a C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I In addition to the previously described infrastructure checks and verifications, it is important to perform basic additional infrastructure checks, such as verifying Layers 1 and 2. When troubleshooting Layer 1 and Layer 2 issues, it is important to understand that given that wireless networks operate in a shared media, more so than wired networks, it is not uncommon to see Cyclic Redundancy Check (CRC) or Physical Layer Control Protocol (PLCP) errors. However, having stated that, it is important to understand that while these errors are normal, an excessive amount of these errors may indicate wireless network issues, which include the following: • Packet collisions due to densely populated clients • Overlapping channels • High multipath conditions due to bounced signals • Other signals in the 2.4GHz band Recall from the SWITCH guide that radio interfaces (WLANs in general) operate in half-duplex mode because a single frequency is used to transmit and receive data. Therefore, in environments with a dense population of clients (STAs), keep in mind that while the 802.11 standard does have some mechanisms for avoiding collisions (i.e., CSMA/CA), it is still possible for collisions to occur in environments with a dense client population. This can adversely impact WLAN performance and result in intermittent connectivity issues. In cases where you do observe a large amount of CRC errors, consider checking for possible radio interference, antennas, and cabling, as well as the line of sight (LOS) between the transmitter and the receiver, to ensure that the LOS is clear from possible interfering objects. NOTE: WLAN problems caused by overlapping channels, multipath conditions, and the presence of other signals in the 2.4GHz band are described in the following sections. In addition to checking for errors, it is also important to remember basic Ethernet fundamentals. For example, if the WLAN is experiencing intermittent connectivity or connectivity with errors, there may be a possibility that the cable length is greater than the recommended Ethernet segment lengths. This is applicable not only to APs but also to antenna cabling. When implementing a WLAN solution, cable runs should be kept as short as possible to allow for optimum efficiency and to prevent loss, which is likely if the cable runs are long. Instead of using standard cabling, such as traditional coaxial cable, to connect antennas, for example, consider using Cisco antenna cables instead. While they may be slightly more expensive than other standard cables, they are recommended for optimum efficiency of the overall WLAN solution. 452 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S Finally, if the LAP and WLC are located on remote subnets, ensure that no ACLs are blocking communication between the two devices. In addition to these checks, ensure that configurations on the infrastructure devices are correct. For example, if using the Cisco IOS DHCP relay agent to forward LWAPP Broadcasts to the WLC, ensure that the ip forward-protocol udp 12223 global configuration command is included in the configuration, in addition to the ip helperaddress <address> interface configuration command. Similarly, when using Cisco IOS DHCP servers, ensure that DHCP Option 43 is also included in the configuration. Antenna and Radio Frequency Issues Antennas are an integral component of wireless implementations. Antennas provide the WLAN system with three fundamental properties, which are gain, direction, and polarization. Gain is a measure of increase in power and is used to describe the amount of increase in energy that an antenna adds to a radio frequency (RF) signal. Direction is the shape of the transmission pattern. Polarization is the physical orientation of the element on the antenna that actually emits the RF energy. Understanding basic antenna functionality and operation is fundamental to understanding the overall wireless solution. In addition, a solid understanding of these principles is necessary for supporting and troubleshooting wireless problems. Cisco wireless equipment supports different styles of antenna. Each of these types has different coverage capabilities. The supported antenna types include omnidirectional and directional antenna types. Omnidirectional antennas are designed to provide a 360-degree radiation pattern and are commonly used when coverage in all directions from the antenna is required. Omnidirectional antenna operation is illustrated in Figure 11-5 below. The coverage provided by this type of antenna is shown in gray (U.S. English) or grey (UK English): Antenna Fig. 11-5. Omnidirectional Antenna Coverage 453 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Directional antennas come in different styles and shapes. Directional antenna types include yagi antennas, patch antennas, and parabolic dishes. Yagi antennas are simply antenna types that radiate only in a specific direction. Figure 11-6 below illustrates basic yagi antenna operation. The coverage is shown in gray (U.S. English) or grey (UK English): Antenna Fig. 11-6. Directional Yagi Antenna Coverage Directional patch antennas are simply a type of flat antenna. Like directional yagi antennas, they provide coverage in a specific direction. Figure 11-7 below shows a mounted patch antenna: Fig. 11-7. Mounted Directional Patch Antenna Finally, parabolic antennas are simply antennas that look like satellite dishes, with which we all should be familiar. These antennas are also commonly referred to simply as a dish antenna. Parabolic antennas have a very narrow RF energy path. These antennas are typically used only in outdoor wireless implementations. 454 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S Having discussed antenna basics, the remainder of this section will describe some of the more common sources of problems, which include, but are not limited to, the following: • Radio power optimization • Radio interference • Electromagnetic Interference • AP channel interference • Multipath • Antenna power issues In some instances, when the AP and the clients that are associated with it are within close proximity, the clients may be disconnected from the AP. Although rare, this can result in poor WLAN performance and intermittent connectivity for clients associated with the AP(s). The recommended solution for such issues is keeping clients away from the AP. This can be performed by installing the AP in locations that, while still accessible to clients, are not within close proximity to the clients. For example, instead of placing an AP on a conference room table, the AP should be placed or mounted on a wall or ceiling in the conference room. In some cases, it may not be possible to prevent clients from being within such close proximity to the AP. For example, it may not be possible to mount the AP in a factory. If the AP and clients will be in such close proximity, you can reduce the power of the AP to prevent clients that are too close to the AP from being disconnected. Radio interference is a common cause of WLAN problems due to the shared media. Proactively, such issues can be avoided by performing site surveys prior to implementing the WLAN solution. Radio interference issues are a common phenomenon because a license is not required to operate radio equipment in the 2.4 GHz band, which is the same band Cisco Aironet WLAN equipment operates. It is therefore possible for other devices, such as microwave ovens or wireless phones, to be using the same band, resulting in interference. You can use a spectrum analyzer to determine the presence of any other activity on your frequency. In the event that there is too much interference, consider changing frequencies, if possible. Electromagnetic Interference (EMI), also referred to as Radio Frequency Interference (RFI), is a disturbance that affects an electrical circuit due to either electromagnetic induction or electromagnetic radiation emitted from an external source. While EMI does not necessarily affect signal transmissions, per se, it can affect the components of the transmitter, resulting in poor WLAN performance and intermittent connectivity issues, for example. To avoid the potential problems caused by EMI, you should ensure that APs are placed away from any potential EMI sources, such 455 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I as fluorescent lights and high-voltage power lines, for example. In some environments (e.g., factories), if you cannot completely remove the AP from sources of EMI, such as power lines, you could alternatively supply conditioned power to the WLAN equipment in order to lessen the effects of EMI generated on those power circuits. However, the recommended solution would still be to isolate the equipment from such sources. Channel interference, which is a direct result of a poor implementation, is also a common cause of WLAN issues. As was stated earlier in this chapter, the 2.4-GHz band consists of 14 channels, each 22 MHz wide. In North America, the FCC allows channels 1 through 11. Most of Europe can use channels 1 through 13. In Japan, only channel 14 is used. Taking this into consideration, when installing APs, you should ensure that adjacent APs use non-overlapping channels. Within the 2.4-GHz range, there are three channels that do not overlap. These channels are 1, 6, and 11. Therefore, use these channels alternately when deploying APs in an ESS. Figure 11-8 below illustrates a recommended AP deployment using these non-overlapping channels: Distribution System CISCO AIRONET 350 SERIES WIRELESS ACCESS POINT CISCO AIRONET 350 SERIES WIRELESS ACCESS POINT CISCO AIRONET 350 SERIES WIRELESS ACCESS POINT AP #1 AP #2 AP #3 Channel 1 Channel 6 Channel 11 Fig. 11-8. Implementing APs Using Non-Overlapping Channels Referencing Figure 11-8, three APs deployed within close proximity or overlapping coverage areas are configured to use overlapping channels to avoid RFI issues, which may lead to connectivity issues and poor throughput. If an additional AP is added, say AP #4, then the AP can be configured to use channel 1. If yet another AP is added, say AP #5, this AP would be configured to use channel 6, and so forth. Multipath is a common cause of WLAN problems due to the nature of the medium used. This situation occurs when RF signals take different paths from a source to a destination. When Radio Frequency signals are transmitted, they become wider as they are transmitted further. This increase in width increases the likelihood of the RF signals running into objects that reflect, refract, diffract, or interfere with the signal, such as furniture, walls, or coated glass. 456 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S When the RF signal is reflected off an object, this causes multiple, duplicate wavefronts to be created and propagated, resulting in multiple wavefronts being received by the receiver. The WLAN multipath concept is illustrated in Figure 11-9 below: Ceiling AP STA Reflector Object Fig. 11-9. Understanding WLAN Multipath Referencing Figure 11-9, the AP transmits a signal. As the signal travels further, it widens. Part of the signal goes straight to the destination, while other parts bounce off an obstruction, such as the ceiling or any other reflector object, such as a steel cabinet, and then go on to the destination. As a result of this obstruction or interference, the obstructed signals will encounter some delay and travel a longer path to reach the same destination. This results in the client receiving multiple wavefronts. When these different waveforms combine, they cause a distortion of the waveform, resulting in poor signal quality, even though the actual signal strength itself may be strong. This delay causes the information symbols represented in 802.11 signals to overlap, which then affects the decoding capability of the client (receiver) and results in poor performance and connectivity issues. The recommended solution is to implement diversity. Diversity is the use of two antennas for each radio. Not only does this increase the probability of receiving better signals from either one of the antennas, it also allows the radio to compensate for errors due to RFI and provides relief to a wireless network in a multipath scenario. With diversity, 457 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I only a single antenna (the best antenna for transmitting to the receiver) is used. The antennas are not used at the same time to avoid introducing multipath issues themselves. By default, Cisco APs default to antenna diversity (i.e., to using dual antennas). Although not recommended, this default behavior can be modified because only a single antenna is required to provide radio operations. NOTE: An alternative to implementing the dual antennas is to implement the 802.11a standard, which provides higher data rates than DSSS and minimizes the effects of multipath propagation on signal quality and throughput. However, keep in mind that 802.11a is not compatible with the other more commonly used 802.11b and 802.11g standards, and typically costs more, from a monetary standpoint, to implement. The final issue discussed in this section pertains to antenna power issues. Antenna gain ratings are measured in decibels (dB), which is a ratio between two values. An antenna rating is typically to the gain of an isotropic (dBi) or dipole (dBd) antenna. The isotropic antenna is a theoretical antenna that transmits equal power density in all directions. These antennas are used only as theoretical (mathematical) references and do not exist in the real world; however, because the U.S. FCC uses dBi for its calculations, this same standard is also used by most wireless equipment manufactures and vendors, such as Cisco, for example. Dipole antennas are more like real-world antennas. While some antennas are rated in dBd, most of the ratings use dBi because all FCC calculations are based on the dBi measurement. Antenna power is a major factor that should be taken into consideration when designing the WLAN, as incorrect calculations may result in poor WLAN performance, resulting in issues such as intermittent connectivity or even in an outright or complete loss of connectivity in some areas. When you are determining which antenna to use or when you are troubleshooting potential RF power issues, keep in mind that as the gain of an antenna increases so does the signal strength and directivity; however, this comes with a tradeoff in that the antenna’s coverage area is diminished. Directivity measures the power density an antenna radiates in the direction of its strongest emission. To clarify this point further, consider Figure 11-10 below, which shows the coverage provided by a low-gain antenna (LGA): 458 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S STA 3 Antenna Coverage STA 1 AP STA 2 STA 4 Distance (Range) Fig. 11-10. Understanding Low Gain Antennas Referencing Figure 11-10, the LGA is able to provide a broad coverage area, which includes STA 2, STA 3, and STA 4. However, because it is low gain, the antenna’s coverage distance is limited and STA 1 is not included in this range. If the gain were increased, then the antennas signal strength and directivity would be increased, allowing it to reach greater distances as is illustrated in Figure 11-11 below: STA 3 Antenna Coverage STA 1 STA 2 STA 4 Distance (Range) Fig. 11-11. Understanding High Gain Antennas 459 AP C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Referencing Figure 11-11, the LGA shown in Figure 11-10 has been replaced by a high-gain antenna (HGA), or the gain on the LGA was simply increased. Either way, the signal is stronger, more direct, and traverses a greater (further) distance, which means that STA 1 now resides within the coverage area. However, this comes at the expense of the area that is actually covered by the AP. Previously, STA 3 was comfortably within the coverage area, while STA 1 was not; however, now STA 3 is no longer within the coverage area, while STA 1 is. The recommended solution in this case would be to reduce the adjusted gain on the AP and integrate another AP into the WLAN solution, keeping in mind standard recommended configuration fundamentals, such as using overlapping channels and ensuring that both APs are configured the same (e.g., SSIDs). In summation, you should be aware of this tradeoff when deciding on antennas or adjusting the gain values for antennas used in the WLAN solution. Careful consideration must be taken before increasing or decreasing gain because this can result in issues such as path loss, which is the distance the signal can be transmitted, as well as a reduced coverage area. CHAPTER SUMMARY The following section is a summary of the major points you should be aware of in this chapter. • Wireless networks use radio waves to transmit data and connect devices • WLANs are meant to augment, not replace, wired LAN infrastructure IEEE 802.11 Components • IEEE 802.11 components include the following: 1. Client or Station (STA) 2. Access Point (AP) 3. Independent Basic Service Set (IBSS) 4. Basic Service Set (BSS) 5. Extended Service Set (ESS) 6. Distribution System (DS) • The client is any appliance that interfaces with the wireless medium as an end user device • The wireless AP functions as a bridge between the STAs and the network backbone • An IBSS is a wireless network, consisting of at least two STAs • The BSS is a cellular architecture which divides the wireless network (system) into cells • The ESS is comprised of overlapping BSS sets (cells), usually connected by the DS • The Distribution System (DS) allows for the interconnection of the APs of multiple cells 460 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S IEEE 802.11 Frames • The 802.11 standard uses the following three types of frames: 1. Control Frames 2. Management Frames 3. Data Frames • • • The 802.11 standard uses control frames to control device access to the wireless medium Management frames enable stations to establish and maintain communications Data frames are sent by any STA and contain higher layer protocol information or data IEEE 802.11 Standards • At the physical (PHY) layer, IEEE 802.11 defines a series of encoding and transmission schemes for wireless communications the most common of which are the Frequency Hopping Spread Spectrum (FHSS), Direct Sequence Spread Spectrum (DSSS), and Orthogonal Frequency Division Multiplexing (OFDM) transmission schemes. Although Infra Red (IR) also exists at this layer, very little development of this standard has occurred due to line-of-sight limitations. The 802.11 standards described in this section are as follows: 1. IEEE 802.11 (original) 2. IEEE 802.11b 3. IEEE 802.11a 4. IEEE 802.11g 5. IEEE 802.11n • • • • • • • • • • • • • • • • The original IEEE 802.11 standard defined WLANs that provided up to 2 Mbps throughput The original 802.11 standard is now considered obsolete because the throughput is too slow 802.11b is an extension to 802.11 that operates in the same unregulated 2.4 GHz band Devices operating on 802.11b use DSSS modulation for higher speeds The 2.4-GHz band consists of 14 channels, each 22 MHz wide In North America, the FCC allows channels 1 through 11 Most of Europe can use channels 1 through 13. In Japan only channel 14 is used IEEE 802.11a is an extension to 802.11 that provides up to 54 Mbps throughput 802.11a uses OFDM and does away with spread spectrum 802.11a is not compatible with 802.11b or 802.11g and therefore is seldom used 802.11a equipment operates at 5GHz The 802.11g standard also works in the same 2.4GHz range as 802.11b IEEE 802.11g operates at a bit rate as high as 54 Mbps, but uses the S-Band ISM and OFDM 802.11g is compatible with 802.11b can operate at the 802.11b bit rates and use DSSS IEEE 802.11n improves on 802.11a and 802.11g maximum data rates The 802.11n standard includes several enhancements to the other 802.11 standards 461 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I The Cisco WLAN Solu on • The Cisco Wireless LAN solution of WLCs and their associated LAPs • WLCs are responsible for system wide wireless LAN functions, such as the following: 1. Integrated Intrusion Prevention System (IPS) 2. Zero-Touch Deployment of Lightweight Access Points (LAPs) 3. Real-time Radio Frequency (RF) management 4. Wireless LAN Redundancy 5. Dynamic Channel Assignment for each LAP 6. Dynamic Client Load Balancing across LAPs 7. Dynamic LAP Transmit Power Optimization 8. Wireless LAN Security Management • WLCs communicate with Controller-based APs over any Layer 2 or Layer 3 infrastructure • WLCs communicate with LAPs using LWAPP or CAPWAP • LWAPP is an IETF draft protocol • CAPWAP, which is based on LWAPP, is a standard, interoperable protocol • The following sequence of events must occur in order for an LAP to register to a WLC: 1. The LAPs issue a DHCP Discovery Request to get an IP address (if no static address is used) 2. The LAP sends a Layer 2 LWAPP discovery request message to the WLC 3. If Layer 2 discovery fails or is not supported, then the Layer 3 discovery method is used 4. Any available WLC responds with an LWAPP Discovery Response 5. The LAP selects the WLC to join, which is typically the first WLC to respond to the LAP 6. The LAP then sends an LWAPP Join Request to the WLC 7. The WLC validates the LAP and then sends an LWAPP Join Response to the LAP 8. The LAP validates the WLC, which then completes the Discovery and Join process 9. The LAP registers with the WLC and can begin accepting client associations • The Cisco WLAN supports the following three types of roaming: 1. Intra-controller Roaming 2. Inter-controller Roaming 3. Inter-Subnet Roaming Troubleshoo ng Cisco WLAN Solu ons • Generally speaking, Wireless LAN issues fall into one of the following problem areas: 1. Wireless Station (STA) or Client Issues 2. WLC Configuration Issues 3. AP Configuration Issues 4. AP and WLC Registration Issues 462 C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S 5. Infrastructure Issues 6. Antenna and Radio Frequency Issues • Wireless client troubleshooting should include the following tasks: 1. Checking the client wireless NIC state 2. Checking client settings 3. Checking the state of the wireless client • Some common WLC device misconfigurations include the following: 1. Mismatched Service Set Identifiers 2. Security Mismatches 3. WLANs are disabled on the WLC 4. Data Rate Mismatches 5. Incorrect Client or Station Filtering 6. Unsupported Features 7. IP Address Assignment Issues 8. SSID Broadcast is Disabled • For APs operating in autonomous mode, ensure that the following parameters are the same: 1. The same Service Set Identifier 2. The same IP subnet 3. The same Layer 2 native VLAN • When using DHCP for AP address allocation, ensure that DHCP Option 43 is configured • When using Cisco IOS DHCP relay agent, forward UDP Broadcasts for port 12223 • When using PoE, ensure that there is enough available power to power up all the APs • Excessive CRC and PLCP errors may point to one or more of the following issues: 1. Packet collisions due to densely populated clients 2. Overlapping channels 3. High multipath conditions due to bounced signals 4. Other signals in the 2.4GHz band • When troubleshooting Layer 1 and Layer 2 issues, check cabling for both APs and antennas • Common sources of antenna and RF problems include, but are not limited to, the following: 1. Radio Power Optimization 2. Radio Interference 3. Electromagnetic Interference 4. AP Channel Interference 463 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 5. Multipath 6. Antenna Power Issues 464 CHAPTER 12 Troubleshoo ng Cisco VoIP and Video Solu ons C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I C onverged networks are increasingly common. Present-day networks support integrated voice, video, and data traffic. Carrying voice, video, and data traffic over a single transport infrastruc- ture requires properly designed Quality of Service (QoS) implementation to ensure the required level of service for all three traffic types. In addition to understanding how to configure Cisco Cata- lyst switches to support these services, you are also expected to understand how to troubleshoot switch configurations that may affect these services. The TSHOOT certification exam objectives covered in this chapter are as follows: • Troubleshoot switch support of advanced services (i.e., Wireless, VOIP, and Video) • Troubleshoot a VoIP support solution • Troubleshoot a video support solution Cisco IP Telephony solutions are an integral component of Cisco Unified Communications. The Cisco Unified Communications solution allows for the integration of voice, video, data, and mobile applications on fixed and mobile networks. Cisco IP Telephony solutions are comprised of call processing solutions, such as Cisco Unified Communications Manager and IP phones. Cisco Unified Video Advantage enhances the existing Cisco IP Telephony solution by providing video telephony functionality to certain Cisco Unified IP phones. The Cisco Unified Videoconferencing solution allows for a wide range of customized as well as fully converged voice, video, and data solutions. This chapter will be divided into the following sections: • Cisco IP Telephony Fundamentals • The Need for LAN and WAN Quality of Service • LAN and WAN IPT QoS Implementation • Cisco IP Video Fundamentals • LAN and WAN Video QoS Implementation • Troubleshooting Converged Networks CISCO IP TELEPHONY FUNDAMENTALS IP Telephony, also referred to as Voice over IP (VoIP), is a generic term that is used to describe the transport of traditional communications services, such as voice and fax, over the Internet Protocol (IP). Some common elements found in a typical Cisco IP Telephony (IPT) solution include the following: • One or more call agents • IP phones 466 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S • Application servers • Voice gateways • Voice gatekeepers Call agents, such as Cisco Unified Communications Manager (CUCM), are responsible for ordering and directing each step of call completion for the endpoints, which may include IP phones and analog and digital ports on voice gateways. CUCM typically communicates with Cisco IP phones using the Skinny Call Control Protocol (SCCP), sometimes simply referred to as Skinny. SCCP is a proprietary network terminal control protocol that is built on a client/server model. Using Skinny, end stations, which may be IP phones or even gateway ports, communicate with the call agent (CUCM) and receive call setup and teardown instructions, among other things, from the call agent. In addition to using Skinny, the call agent can communicate with endpoints using the Media Gateway Control Protocol (MGCP). Like SCCP, MGCP also uses a client/server model where the call agent is responsible for ordering and directing each step of call completion for the endpoints. However, unlike Skinny, MGCP is an open standard that is supported by many different vendors. In addition to controlling and directing endpoints, call agents also provide other services and functions, which include bandwidth management, address translation, and Call Admission Control (CAC). CAC is used to protect the quality of voice call by preventing call completion if there are not enough resources available to support the call(s). It is not the same thing as QoS, which is used to protect real-time traffic, such as voice and video, from data traffic that is also contending for the same network resources. NOTE: You are not required to go into any further detail on SCCP or MGCP in the TSHOOT certification exam. Additionally, while QoS is a core requirement of the TSHOOT exam, you are not required to go into detail on Call Admission Control (CAC). SCCP, MGCP, and CAC are not described in any additional detail in the remainder of this final chapter. An IP phone is simply a telephony endpoint that allows telephone calls to be made over an IP network connection instead of a traditional analog connection. Cisco IP Telephony implementations typically include one or more application servers that are used to provide additional services, such as voice mail (Cisco Unity, Unity Connection, and Unity Express); unified messaging (Cisco Unity Unified Messaging); call routing and comprehensive contact management capabilities (Cisco Unified Contact Center Enterprise and Cisco Unified Contact Center Express); and user availability and communications capabilities information (Cisco Unified Presence). 467 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I Voice gateways are an integral part of the VoIP network. In their most simple form, gateways are used to connect dissimilar networks, such as the IP network to the traditional Public Switched Telephone Network (PSTN). In essence, the primary function of a gateway is to convert data traveling through it to a format that the other side understands. However, Cisco IOS voice gateways can perform additional tasks and functions, such as translations, CAC, and Quality of Service (QoS), among other things. The call agent can communicate with Cisco IOS voice gateways using MGCP, SCCP, H.323, or the Session Initiation Protocol (SIP). As is the case with MGCP and SCCP, you are not required to go into any detail on either H.323 or SIP in the current TSHOOT certification exam. Gatekeepers are optional H.323 IP Telephony components that, when used, provide advanced services, such as address translation, CAC, bandwidth control, and zone management. A zone is a collection of all endpoints that are managed by the gatekeeper. A simple common example of an endpoint would be a voice gateway. Figure 12-1 below shows the components described in this section and how they would be integrated into an IP Telephony solution: Call Agent #1 Call Agent #2 Gatekeeper #1 Gatekeeper #2 Unity Server #1 Unity Server #2 Si Si Gateway #1 CISCO IP PHONE 7970 SERIES 1 2 ABC 4 GHI 5 JKL 3 1 ? DEF 2 ABC 6 4 MNO GHI - 7 5 JKL V Gateway #2 3 ? DEF 6 MNO - + 8 9 7 8 9 PQRS TUV WXYZ PQRS TUV WXYZ * OPER 0 # * OPER 0 # IP Phone V CISCO IP PHONE 7970 SERIES + IP Phone PSTN Fig. 12-1. A Basic IP Telephony Implementation Figure 12-1 shows the different VoIP components described in the previous section. It should be noted that this diagram is reflective of a single site. In the event that there are multiple sites, sometimes the voice gateways may also be used to connect to the WAN, allowing remote IP phones to communicate with the local call agent. Alternatively, additional dedicated routers can also be used to connect to the WAN, providing extra network fault tolerance. 468 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S Prior to implementing an IPT or VoIP solution, it is imperative that the network (i.e., both the LAN and the WAN) be adequately prepared or configured to support this traffic. The sections that follow describe some core requirements, which include the following: • Dynamic Host Configuration Protocol services • Network Time Protocol services • Trivial File Transfer Protocol services • Cisco Discovery Protocol • Voice or auxiliary VLAN • Power over Ethernet • LAN and WAN Quality of Service Dynamic Host Configura on Protocol Services Like most other endpoints, Cisco IP phones require the services of a Dynamic Host Configuration Protocol (DHCP) server to obtain the addressing information necessary for them to access network services. However, in addition to basic addressing services, Cisco IP phones also need DHCP servers to provide them with address information for the TFTP server from which they will download their configuration file. Within the DHCP server configuration, the IP address of the TFTP server should be configured as either DHCP Option 150 (when specifying an IP address) or DHCP Option 66 (when specifying an FQDN). However, it should be noted that you can also specify an IP address when you use DHCP Option 66. When configuring the DHCP pool for the IP phones, it is important to remember that these two options should never be configured together for the same pool. These options are considered mutually exclusive (i.e., use one or the other). Using both of these options could result in IP phones using the incorrect or less preferred TFTP server. NOTE: If DHCP Option 66 is used and a name is specified, a Domain Name System (DNS) server must be available to resolve the name of an IP address; otherwise, the IP phone will not be able to contact the TFTP server to retrieve its configuration file. Because using Option 66 adds additional dependency on a DNS server, this method is seldom used; DHCP Option 150 is most commonly used. The following configuration example illustrates how to configure a Cisco IOS DHCP server for IP phones and specify the IP address of the TFTP server using DHCP Option 150: R1(config)#ip dhcp pool DEFAULT-IPT-POOL R1(dhcp-config)#network 10.0.0.0 /24 R1(dhcp-config)#default-router 10.0.0.1 R1(dhcp-config)#lease 8 0 0 469 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I R1(dhcp-config)#option 150 ip 10.0.0.252 R1(dhcp-config)#exit The following configuration example illustrates how to configure a Cisco IOS DHCP server for IP phones and specify the name of the TFTP server using DHCP Option 66. Because a name is specified, the DHCP pool is also configured with the IP addresses of two DNS servers. This will allow the IP phones to communicate with the named TFTP server: R1(config)#ip dhcp pool DEFAULT-IPT-POOL R1(dhcp-config)#network 10.0.0.0 /24 R1(dhcp-config)#default-router 10.0.0.1 R1(dhcp-config)#lease 8 0 0 R1(dhcp-config)#option 66 ascii cucmpub.howtonetwork.net R1(dhcp-config)#dns-server 172.16.1.253 172.16.1.254 R1(dhcp-config)#exit NOTE: Additional information on Cisco IOS DHCP server and Cisco IOS DHCP relay agent configuration can be found in the “Troubleshooting Cisco IOS DHCP and NAT” chapter in this guide or within the “Branch Office and Teleworker Technologies” chapter in the ROUTE guide, which is currently available online. Network Time Protocol Services Network Time Protocol (NTP) is an integral component of IPT implementation. NTP services are applicable to both the Cisco IP phones and the CUCM. For Cisco IP phones, the NTP server can be used to ensure that they display the correct time. However, this service may also be used for additional functions, such as restricting calls based on time and correct or accurate reporting, for example. This functionality is configured in the DHCP server, which provides this information to the IP phones in conjunction with other addressing parameters. In Cisco IOS software, one or more NTP servers can be specified for the DHCP pool using the option 42 <ip|ascii> [address|name] DHCP pool configuration command. Again, keep in mind that if a DNS name is provided, then the pool must also be configured with at least one DNS server. The following example illustrates how to specify an NTP server for Cisco IP phones when configuring a DHCP pool and using the Cisco IOS DHCP server feature: R1(config)#ip dhcp pool DEFAULT-IPT-POOL R1(dhcp-config)#network 10.0.0.0 /24 R1(dhcp-config)#default-router 10.0.0.1 R1(dhcp-config)#lease 8 0 0 R1(dhcp-config)#option 150 ip 10.0.0.252 R1(dhcp-config)#option 42 ip 172.16.1.1 R1(dhcp-config)#exit 470 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S In addition to Cisco IP phones, CUCM servers also use the services of an NTP server. During installation, Cisco Unified CallManager prompts the administrator to specify the external NTP server at publisher installation. If the server is not reachable, the installation fails. Additionally, when installing additional servers as part of a cluster, if the first node is not synchronized with an NTP server, the installation of the subsequent node will fail. NOTE: The CUCM server can be configured to synchronize with a Cisco IOS router or switch configured as an NTP server. Cisco IOS software NTP server configuration is described in the “Network Monitoring and Maintenance” chapter in this guide. TFTP Services TFTP services are a critical component of the overall Cisco IP Telephony solution. Cisco IP phones receive their configuration parameters from the specified TFTP server. In most instances, the CUCM server is the TFTP server. However, Cisco IOS routers running Cisco Unified Communications Manager Express (CUCME) can also be configured as TFTP servers. The phone receives TFTP server information via either DHCP Option 150 or Option 66, as described previously in the DHCP Services section. Cisco IOS TFTP server functionality is disabled by default but can be enabled using the tftpserver <location>:<filename> global configuration command. The following configuration example illustrates how to configure a Cisco IOS software router running CUCME to function as a TFTP server for IP phones. This configuration allows Cisco IP phones that use this device as the TFTP server to download the correct firmware files: R1(config)#$ flash:PHONE/7940-7960/P00308000500.bin alias P00308000500.bin R1(config)#$h:PHONE/7940-7960/P00308000500.loads alias P00308000500.loads R1(config)#$ flash:PHONE/7940-7960/P00308000500.sb2 alias P00308000500.sb2 R1(config)#$ flash:PHONE/7940-7960/P00308000500.sbn alias P00308000500.sbn NOTE: The $ sign in the configuration above is simply due to the text being too long for a single line and is therefore truncated. The configuration also assumes that the files specified exist in the specified location (i.e., in Flash memory). Make sure that the relevant files do exist prior to implementing such configurations. Cisco Discovery Protocol It is common (recommended) practice to disable Cisco Discovery Protocol (CDP) for security purposes, as it prevents any adjacent devices from gaining information about the router or switch. However, CDP is an integral component of the Cisco IP Telephony solution. CDP is used to inform Cisco IP phones of the voice or auxiliary VLAN. Additionally, CDP is also an integral component 471 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I of inline power. If you must disable CDP, do it on a per-port basis, ensuring that the switch ports connected to Cisco IP phones have CDP enabled. If CDP is disabled on these ports, the IP phone cannot exchange messages with the switch about the voice VLAN or power requirements. Voice VLAN(s) When connecting Cisco IP phones to the switch, it is recommended that the switch interface be configured as a Multi-VLAN Access Port (MVAP), which allows the Cisco IP phone and connected device (e.g., a laptop or workstation) to use different VLANs. This configuration will allow voice traffic to receive priority over the data traffic sent by the connected device. While not mandatory, the use of a separate VLAN for the voice traffic ensures that this delay-sensitive traffic is able to be prioritized over normal user data traffic. NOTE: When using an MVAP, from the perspective of the attached device, it is simply connected to a switch. In other words, the connected device (e.g., the workstation or laptop) is completely unaware of the fact that it is actually connected to an IP phone. The voice or auxiliary VLAN is configured in the same manner as any other VLAN. All normal VLAN rules, such as Spanning Tree priorities and parameters, apply to this VLAN in the same manner as they would for any standard data VLAN. On an MVAP, a native VLAN for data traffic for the workstation connected to the IP phone is identified by the port VLAN identifier (PVID), which is specified using the switchport access vlan <vlan> interface configuration command. An auxiliary VLAN for voice service is identified by the voice VLAN identified (VVID), which is specified using the switchport voice vlan <vlan> interface configuration command. Following this configuration, the switch communicates the VVID to the IP phone using the CDP. Frames sent by the PC connected to the Cisco IP phones will be sent in the native VLAN (PVID). These frames will be untagged. Frames or packets sent by the Cisco IP phone will use the auxiliary VLAN (VVID). These frames will include the 802.1Q tag. Within this tag, the User Priority field contains the Quality of Service (QoS) information. QoS requirements for VoIP are described later in this section. Power over Ethernet Cisco IP phones can use an external power supply to draw their power or they can draw their power from the switch to which they are connected, with the latter being the most commonly used method. When power is drawn from the switch, the power is sent within the Ethernet cable connecting the switch and the IP phone. The following two methods provide inline power: 1. IEEE 802.3af-2003 2. Cisco inline power 472 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S The IEEE 802.3af-2003 Power over Ethernet (PoE) standard defines terminology to describe a port that acts as a power source (PSE) to a powered device (PD); defines how a powered device is detected; and defines two methods of delivering PoE to the discovered PD. Cisco inline power (PoE) is a proprietary approach. The IEEE 802.3af-2003 standard is actually based on this method of PoE, which was available before PoE was standardized. Cisco has also extended power management extensions using CDP negotiation to Cisco IEEE 802.3af-compliant devices to optimize PSE power management further. Cisco Catalyst switches support both inline power (ILP) and IEEE 802.3af2003. When selecting to use PoE, it is important to calculate adequately the amount of power that will be required. You can use the Cisco Power Calculator available on the Cisco Web site to perform such calculations. Incorrect power calculations can adversely impact IP Telephony implementation. When calculating total power requirements for the IP Telephony solution, you should also factor in other solutions or devices that will draw power from the same switches. Such devices may include wireless Access Points, for example. You can use the show power suite of commands to verify PoE functionality. For example, you can use the show power inline command to verify inline power status as follows: Cat-6500-1#show power inline Interface Admin Oper Power(Watts) Device Class From PS To Device --------- ------ ---------- ---------- ---------- ------------------- ----Gi1/1 Gi1/2 Gi1/3 Gi1/4 Gi1/5 Gi1/6 Gi1/7 Gi1/8 Gi1/9 auto auto auto auto auto auto auto auto auto off off off off on on off on on 0 0 0 0 11.8 13.5 0 7.1 14.5 0 0 0 0 10.5 12.0 0 6.3 12.9 ... [Truncated Output] 473 n/a Cisco Cisco n/a Cisco Cisco n/a Cisco Cisco n/a AIR-LAP1252AG n/a AIR-LAP1252AG n/a n/a IP Phone 7937 3 IP Phone 7945 3 n/a IP Phone 7942 2 IP Phone 7961 3 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I THE NEED FOR LAN AND WAN QUALITY OF SERVICE Quality of Service (QoS) is a critical component of any IP Telephony solution. Converged networks must provide secure, predictable, measurable, and sometimes guaranteed services. In order to ensure successful end-to-end business solutions, QoS is required to manage network resources. Several reasons why QoS is required when integrating real-time, delay-sensitive traffic, such as voice and video, into the network include the following: • Delay issues • Bandwidth issues • Jitter issues • Packet loss issues All packets in a network experience some kind of delay from the time the packet is first sent to when it arrives at its intended destination. This total delay, from start to finish, is referred to as latency. There are several types of delay that may be experienced by packets or frames. Some common causes of delay include, but are not limited to, the following: • Serialization delay • Queuing delay • Processing delay • Forwarding delay Serialization delay refers to the amount of time that it takes to send bits serially (i.e., one bit at a time) across the wire. Queuing delay is the delay experienced when packets wait for other packets to be sent. Processing delay is the time taken by the digital signal processor (DSP) to compress a block of pulse-code modulation (PCM) samples. This is also referred to as Coder delay. Finally, forwarding delay includes the processing time from when a frame and when the packet has been placed in the output queue. NOTE: You are not required to go into any detail on DSPs or PCM sampling in the TSHOOT certification exam. These will not be described in any further detail in the remainder of this chapter. While there are numerous types of delay, all delay types fall into one of two categories: fixed delay and variable delay. Fixed delay components add directly to the overall delay on the connection. Examples of fixed delay include serialization delay, processing delay, and packetization delay. Packetization delay is the time taken to fill a packet payload with encoded or compressed speech. 474 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S Variable delays, on the other hand, arise from queuing delays in the egress trunk buffers on the serial port connected to the WAN. These buffers create variable delays, called jitter, across the network. Variable delays are handled through the de-jitter buffer at the receiving router or gateway. Jitter is described in additional detail later in this section. Generally speaking, bandwidth refers to the number of bits per second (bps) that are expected to be delivered successfully across some medium. Based on this definition, bandwidth is equal to the physical link speed or clock rate of the interface. In switching terms, however, the term bandwidth refers to the capacity of the switch fabric. Therefore, the bandwidth considerations for WAN connections, for example, are not necessarily the same for LAN connections. Jitter is the variation in delay between consecutive packets and is caused by variable queuing delays. For this reason, jitter is often commonly referred to as variation delay. While such variations may be acceptable for applications and data traffic, they can severely impact isochronous traffic, such as digitized voice, which requires that packets are transmitted in a consistent, uniform manner. The varying arrival time of the packets can cause gaps in the recreation and playback of the original voice signal. This is both undesirable and annoying to the listener. Jitter can be mitigated using de-jitter buffers. Packet loss occurs when one or more packets traversing the network fail to reach their intended destination. This may occur for several reasons, such as bit errors, lack of space in queues, and, most commonly, congestion. While this does not generally affect connection-oriented protocols, such as TCP, packet loss can cause major issues for real-time traffic, such as voice and streaming video traffic. Packet loss can be mitigated using congestion management and/or congestion avoidance mechanisms. These are described later in this chapter. LAN AND WAN IPT QOS IMPLEMENTATION In order to understand QoS implementation, it is important to have an understanding of the three different QoS models and how they are applicable when designing and implementing a QoS solution. The three Quality of Service models are as follows: 1. Best Effort Delivery (Default) 2. Integrated Services 3. Differentiated Services As the name implies, the Best-Effort (BE) delivery model does not guarantee any level of service and instead internetwork devices simply make their ‘best effort’ to deliver packets as quickly as possible. The best-effort delivery model scales well but provides no difference in service for differ- 475 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I ent traffic classes. In other words, when this model is used (which is the default), voice, video, and data traffic are all treated as one and the same. This model requires no QoS implementation within the internetwork and is not recommended. Integrated Services (IntServ) performs admission control for each flow request. IntServ provides a way to deliver end-to-end QoS for real-time applications by explicitly managing network resources to provide QoS to specific user packet streams (flows). RFC 1633 defines two components to provide guarantees per flow: resource reservation and admission control. IntServ uses Resource Reservation Protocol (RSVP) to signal the internetworking devices about how much bandwidth and delay a particular flow requires. Intelligent queuing mechanisms can be used with RSVP to provide the following two kinds of services: • Guaranteed rate service • Controlled load service Guaranteed rate service allows applications to reserve bandwidth to meet their specific requirements. In Cisco IOS software, the guaranteed service rate can be provided using Weighted Fair Queuing (WFQ) in conjunction with RSVP. Controlled load service allows applications to have low delay and high throughput, even during times of congestion. This service can be provided using RSVP and Weighted Random Early Detection (WRED). Finally, admission control is used to decide when a reservation request should be rejected. NOTE: WFQ and WRED are described in additional detail later in this chapter. The primary issue with IntServ is that is scales very poorly, especially when many sources are attempting to reserve end-to-end bandwidth for each of their particular flows. Unlike IntServ, the Differentiated Services (DiffServ) model requires no advance reservations and therefore scales very well. DiffServ defines the concept of service classes. DiffServ also allows each internetwork device to handle these packets on an individual (per hop) basis. This is referred to as per-hop behavior (PHB). DiffServs are applicable to Layer 3. Layer 2 frames use Class of Service (CoS) bits that are contained within the 802.1Q or ISL-encapsulated frame. In addition to IP Precedence (IP Prec), DiffServ defines a new Differentiated Services Code Point (DSCP) field in the IP packet header by redefining the Type of Service (ToS) byte and creating a replacement for the IP Precedence field with a new 6-bit field called the Differentiated Services (DS) field. The last 2 bits of the ToS byte can now be used to perform flow control, and are referred to as the Explicit Congestion Notification (ECN) bits. 476 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S DiffServ defines three sets of PHBs: Class Selector (CS), Assured Forwarding (AF), and Expedited Forwarding (EF). The Class Selectors are DSCP values that are compatible with IP Prec values. The Assured Forwarding (AF) PHB set is used for two functions: queuing and congestion avoidance. Queuing places the packets into the different software queues based on the QoS labels. The last PHB set is the Expedited Forwarding (EF) set. This uses a single DSCP value (EF) to represent it. EF packets are given premium service (above all other classes). This is the default value assigned to voice media packets in Cisco IP Telephony solutions. NOTE: Additional detailed information on the three QoS models described in this section can be found in the current SWITCH guide, which is available online at www.howtonetwork.net. LAN QoS Solu ons for IP Telephony Implementa ons The primary purpose of LAN QoS is buffer management. Switches require buffering to avoid buffer overflows. Buffer overflows occur when multiple ingress ports are contending for the same egress port. Catalyst switch QoS is primarily based on the Layer 2 markings that are contained within a frame (i.e., the CoS value). However, it can also be based on the Layer 3 markings contained within a packet (i.e., the IP Precedence or DSCP values). IP Precedence and DSCP were described in the previous section. The CoS value is contained in the VLAN field for 802.1Q and ISL-encapsulated frames. On Cisco Catalyst switches, the configuration of a VVID configures the switch to send CDP packets to the Cisco IP phone, instructing the phone to send voice traffic in 802.1Q frames, tagged with the specified VVID and a Layer 2 CoS value of 5. However, it should be noted that simply configuring a voice VLAN alone does not means that this automatically occurs. Instead, QoS must be enabled manually on the switch, and the switch port(s) must be configured to trust incoming QoS markings. In Cisco IOS Catalyst switches, QoS is enabled globally using the mls qos global configuration command. Following this, the mls qos trust interface configuration command must be used to configure the port to trust incoming CoS or DSCP markings. Alternatively, the switch port can simply be configured to trust frames/packets received from the attached Cisco IP phone. In addition to trusting packets/frames received from the phone, Cisco IOS software also allows administrators to trust or re-mark frames/packets received from the device that is also attached to the IP phone using the switchport priority extend suite of commands. LAN QoS solutions for IP Telephony integration are typically performed in the ingress direction when configuring Catalyst switches. Ingress QoS mechanisms are applied to frames and packets received by the switch in the inbound direction. Catalyst switches support the following ingress QoS mechanisms: 477 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I • Traffic classification • Traffic policing • Marking • Congestion management and avoidance Classification is used to differentiate one stream of traffic from another so that different service levels can be applied to different streams of traffic. Frames can be classified based on the incoming CoS, DSCP, or even Access Control List (ACL) configuration. When the switch receives a frame or packet with an already existing QoS value, it must decide whether to trust the received QoS value. This is determined using the port trust setting. Trust settings are configured at the trust boundary, which is the perimeter of the network, such as the access port to which a workstation or IP Phone is connected. The traffic that is received from beyond the perimeter is considered untrusted, unless it is explicitly trusted using the switchport priority extend trust interface configuration command on the switch. The trust boundary itself is configured using the mls qos trust [cos|device cisco-phone|dscp|ip-precedence] interface configuration command. Policing is a process that is used to limit traffic to a prescribed rate. Policing is used to compare the ingress traffic rate to a configured policer. The policer is configured with a rate and a burst. The rate defines the amount of traffic that is sent per given interval. When that specified amount has been sent, no more traffic is sent for that given interval. The burst defines the amount of traffic that can be held in readiness for being sent. Traffic in excess of the burst can be either dropped or have its priority setting reduced. Traffic that conforms to the policing configuration is considered in profile and will be forward, as configured, by the switch. However, traffic that does not conform to the policing configuration is considered out of profile. Out of profile traffic can either be dropped, or marked down (i.e., remarked), with a lower QoS value. Marking involves setting QoS bits inside the Layer 2 or Layer 3 headers, which allows the other internetwork devices to classify based on the marked values. Marking is typically used in conjunction with traffic policing. For example, if the traffic is in profile, the switch will typically allow the packets to be passed through (i.e., will not change or reset the QoS settings in the packets). However, if the traffic is out of profile, the switch may be configured to mark down this traffic with a lower QoS value. However, marking can also be used in conjunction with classification. For example, you could issue the switchport priority extend cos 3 command to mark all frames/packets received from the device connected to the IP phone LAN port with a CoS value of 3, effectively classifying this traffic. 478 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S Congestion management and avoidance is comprised of three elements, which are queuing, dropping, and scheduling. Queuing is used to place packets into different software queues based on the QoS labels. After the traffic is classified and marked with QoS labels, it is assigned into different queues based on the QoS labels. Once the packets have been placed into the appropriate queue based on their QoS values, dropping is used to manage queues. Dropping provides drop priorities for different classes of traffic. Queues have drop thresholds that are used to indicate which packets can be dropped once the queue has filled beyond a certain threshold. After ingress packets are placed into the queue, a congestion avoidance mechanism will use a CoS-to-threshold map to determine what frames are eligible to be dropped when a threshold is breached. This prevents the queues from filling up. Catalyst switches typically have two ingress queues, one of which either is a priority queue or can be configured as a priority queue. The ingress frames and packets received by the switch are placed in a queue based on the ingress (received) CoS value. Voice traffic, for example, that is received with CoS 5 or DSCP EF will be placed into the priority queue, while regular data traffic will be placed into the normal queue. Scheduling refers to how the queues are serviced or emptied. If a priority queue is configured, it only makes sense that this be serviced (emptied) before the normal queue. In other words, the packets in the priority queue should be sent before the packets in the normal queue. Catalyst switches use Strict Round Robin (SRR) for ingress scheduling. SRR is beyond the scope of the TSHOOT certification exam and will not be described in any further detail in this chapter. WAN QoS Solu ons for IP Telephony Implementa ons The primary purpose of WAN QoS mechanisms is to make better use of bandwidth. However, in converged networks, consideration should also be given to protecting voice traffic and ensuring that voice quality is not impacted by other traffic types. Voice quality problems are one of the most common issues experienced in converged networks. These problems are typically attributed to packet drops, queuing delays, and link congestion. While implementing QoS will not guarantee that there will never be any VoIP quality issues, it does allow you to optimize voice quality in networks by protecting this type of traffic. In Cisco IOS software, routers can be configured to use strict priority queuing or Weighted Random Early Detection (WRED) mechanisms when implementing VoIP QoS on the WAN. Strict priority queuing can be accomplished with Class-Based Weighted Fair Queuing (CBWFQ) by using the IP RTP Priority, Frame Relay IP RTP Priority, or Low Latency Queuing (LLQ) features. The IP RTP Priority feature provides a strict priority queuing scheme for delay-sensitive 479 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I data, such as voice. Using this method, voice traffic is identified by its Real-Time Transport Protocol (RTP) port numbers and classified into a priority queue. The result is that voice is serviced as strict priority in preference to other non-voice traffic types. The Cisco IOS Frame Relay IP RTP Priority feature performs a similar function to the IP RTP Priority feature but is specific to Frame Relay. This feature provides a strict priority queuing scheme on a Frame Relay Permanent Virtual Circuit/Data Link Connection Identifier (PVC/DLCI) and is used in conjunction with Frame Relay map classes. Low Latency Queuing (LLQ) provides strict priority queuing in conjunction with CBWFQ. LLQ configures the priority status for a class within CBWFQ, in which voice packets receive priority over all other traffic. This strict priority mechanism, like the others also described, reduces jitter in voice conversations, improving overall voice quality. It should be noted that of the strict priority mechanisms described, the Cisco Enterprise Solutions Engineering (ESE) group considers Low Latency Queuing (LLQ) not only a recommendation but also best practice when implementing voice QoS solutions on the WAN. Weighted Random Early Detection is a queue management mechanism that provides differentiated performance characteristics for different classes of service. This allows for the preferential handling of voice traffic under congestion conditions. WRED drops lower-priority traffic more aggressively than higher-priority traffic as the interface’s output queue begins to become congested. In Cisco IOS software, WRED can be implemented in conjunction with Explicit Congestion Notification (ECN). The ECN bits are the last 2 bits of the ToS byte. These can now be used to perform flow control by allowing routers and end hosts to use this marking as a signal that the network is congested in order to slow down the sending of packets. Configuring WAN QoS in Cisco IOS So ware Routers While LAN QoS configuration is described and illustrated in detail in the SWITCH study guide, WAN QoS configuration has yet to be described or illustrated. While you are not required to go into detail on any advanced WAN QoS configurations, you should have a basic understanding of how to configure and verify the mechanisms described in the previous section. In Cisco IOS software, the IP RTP Priority feature is configured under the WAN interface using the ip rtp priority <lower bound UDP port number> <port range> <bandwidth> interface configuration command. When using this command, you can prioritize and allocate bandwidth to RTP UDP port numbers for voice, video, and whiteboard traffic using the ranges listed in Table 12-1 below: 480 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S Table 12-1. RTP UDP Port Ranges Application Voice Whiteboard Video Starting RTP UDP Port Number 16,384 32,768 49,152 Ending RTP UDP Port Number 32,767 49,151 65,535 The following configuration example illustrates how to configure the IP RTP Priority feature for the entire voice RTP UDP port range while specifying a total maximum bandwidth of 512Kbps for this type of traffic. The configuration is applied under a WAN interface: R1(config)#interface Serial0/0 R1(config-if)#ip rtp priority 16384 16383 512 R1(config-if)#exit This configuration can be validated by viewing the router configuration. The Frame Relay IP RTP Priority feature is specified in a Frame Relay map class, which is then applied to one or more PVCs/ DLCIs. The following configuration example illustrates how to configure a Frame Relay map class, enable the Frame Relay IP RTP Priority feature, and then apply the configuration to Frame Relay DLCI 100, which is configured under the router Serial0/0 interface: R2(config)#map-class frame-relay TSHOOT R2(config-map-class)#frame-relay ip rtp priority 16384 16383 512 R2(config-map-class)#exit R2(config)#interface Serial0/0 R2(config-if)#encapsulation frame-relay R2(config-if)#frame-relay interface-dlci 100 R2(config-fr-dlci)#class TSHOOT R2(config-fr-dlci)#exit Again, this configuration can be validated by checking the router configuration. LLQ, as was stated earlier in this chapter, is configured in conjunction with CBWFQ. CBWFQ is implemented using the Cisco IOS Modular QoS CLI (MQC). MQC configuration in Cisco IOS software requires three steps to be performed. These steps are listed and described below: 1. Define a class map, which is used to identify the ‘interesting traffic,’ using the class-map [match-any|match-all] <name> global configuration command. In class-map config- uration mode, match packets against an ACL, protocol, or other QoS parameters, such as DSCP values, for example. To ensure successful matches, it is important that traffic is marked and classified at the network edge. The match-any and match-all keywords allow you to specify whether to match any one of the parameters matched in the class map (if more than one is specified) or match all parameters in the class map. Using the match-all keyword means that all specified conditions must be met for an actual match to be made. 481 C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I 2. Define a policy map, which is used to determine what to do with the different class maps once the traffic has been matched. The policy map can be used to implement LLQ (strict priority queuing), WRED, and other actions, such as re-marking packets before they are transmitted across the WAN. The policy map is configured using the policy-map <name> global configuration command. In policy-map configuration mode, explicitly configured class maps are matched using the class <class-map-name> command. The default class, which is used for all other traffic types that are not included in the explicitly configured class maps, is configured using the class class-default policy-map configuration command. 3. Apply the policy map to interfaces, subinterfaces, PVCs, or DLCIs using the servicepolicy output <policy-map-name> configuration command. Although the policy- map can also be applied in the inbound direction, this configuration is beyond the scope of the TSHOOT certification exam and will not be discussed in this chapter. The following configuration example illustrates how to implement strict priority queuing (LLQ) in conjunction with CBWFQ on a router. The router is configured to match VoIP media traffic (RTP) and assign it to the priority queue using LLQ. This traffic is allocated 128K of the available bandwidth. This traffic is assigned a DSCP value of EF prior to being transmitted across the WAN. The router is also configured to match video traffic (RTP) and assigns this traffic at least 256K of the available bandwidth. This traffic is assigned a DSCP value of AF 41 prior to being transmitted across the WAN. All other traffic (i.e., the default class) is assigned a default DSCP value of 0 prior to being transmitted across the WAN via Serial0/0: R1(config)#class-map match-any TSHOOT-VOICE R1(config-cmap)#match protocol rtp audio R1(config-cmap)#exit R1(config)#class-map match-any TSHOOT-VIDEO R1(config-cmap)#match protocol rtp video R1(config-cmap)#exit R1(config)#policy-map TSHOOT-MULTIMEDIA-QOS R1(config-pmap)#class TSHOOT-VOICE R1(config-pmap-c)#priority 128 R1(config-pmap-c)#set ip dscp ef R1(config-pmap-c)#exit R1(config-pmap)#class TSHOOT-VIDEO R1(config-pmap-c)#bandwidth 256 R1(config-pmap-c)#set ip dscp af41 R1(config-pmap-c)#exit R1(config-pmap)#class class-default R1(config-pmap-c)#set ip dscp default R1(config-pmap-c)#exit R1(config-pmap)#exit R1(config)#interface Serial0/0 R1(config-if)#service-policy output TSHOOT-MULTIMEDIA-QOS R1(config-if)#exit 482 C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S Following this configuration, you can use the show policy-map interface