Uploaded by Nathan Casey

CCNPtshoot

advertisement
Cisco CCNP TSHOOT
Simplified
Your Complete Guide to Passing the
CCNP TSHOOT Exam
Paul Browning (LLB Hons) CCNP, MCSE
Farai Tafa dual CCIE
This study guide and/or material is not sponsored by, endorsed by or affiliated with Cisco Systems,
Inc. Cisco®, Cisco Systems®, CCDA™, CCNA™, CCDP™, CCNP™, CCIE™, CCSI™, the Cisco Systems
logo and the CCIE logo are trademarks or registered trademarks of Cisco Systems, Inc in the United
States and certain other countries. All other trademarks are trademarks of their respective owners.
Copyright Notice
Copyright © 2011 Paul Browning all rights reserved. No portion of this book may be reproduced
mechanically, electronically or by any other means, including photocopying without written permission of the publisher.
ISBN: 978-0-9557815-8-2
Published by:
Reality Press Ltd.
Midsummer Court
314 Midsummer Blvd.
Milton Keynes
MK9 2UB
help@reality-press.com
LEGAL NOTICE
The advice in this book is designed to help you achieve the standard of Cisco Certified Network Engineer which is Cisco’s foundation internetworking examination. A CCNA is able to carry out basic
router and switch installations and troubleshooting. Before you carry out more complex operations
it is advisable to seek the advice of experts or Cisco Systems, Inc.
The practical scenarios in this book are meant to illustrate a technical point only and should be used
on your privately owned equipment only and never on a live network.
C I S CO C C N P T S H O OT S I M P L I F I E D
INTRODUCTION
Firstly, we want to say congratulations for investing in yourself and your future. Actions speak far
louder than words and you have already taken a very important step towards your future as a Cisco
Voice Engineer.
The new CCNP track has been developed based on continuing feedback from Cisco customers
who inform Cisco about what skills and abilities they want to see in their engineers. Over the past
few years, Cisco exams have become increasingly harder and, of course, your certification expires
every three years, so many engineers who have not kept themselves up-to-date have struggled to
maintain their certification.
The objective for us, as with all of our Cisco Simplified manuals, is to help you do two things. First
and foremost, our goal is to equip you with the skills, knowledge, and ability to carry out the dayto-day role of a Cisco network engineer. We don’t want you to be a walking manual, but we do want
you to know how to do the stuff that we consider the “bread and butter” jobs a CCNP engineer
would need to carry out.
Secondly, of course, we want you to pass your Cisco exams. The mistake many Cisco students
make is to do whatever it takes to pass the exam. Even if that approach did work, people taking this
tack often sell themselves short. The reason is that most job interviews nowadays consist of both a
hands-on and a theoretical test. If a student doesn’t have a grasp of how the technology works, he
or she has no hope of success in the real world.
These are the current exams you need to pass in order to become a CCNP:
•
642-902 ROUTE—Implementing Cisco IP Routing
•
642-813 SWITCH—Implementing Cisco IP Switched Networks
•
642-832 TSHOOT—Troubleshooting and Maintaining Cisco IP Networks
Each exam features theoretical questions as well as multiple hands-on labs where you could be
asked to configure or troubleshoot any of the technologies in the syllabus. In addition, you have
only 120 minutes to complete all of the tasks and answer all of the questions.
Each chapter is broken down into an overview and then the main theory discussion before moving
on to a review section covering the main learning points. Be patient with yourself because there is
a lot to learn. If you put about two hours aside every day to study, you should be ready to attempt
the exam in approximately 60 days from the day you start. If you take days off or holidays, then of
course it will take much longer.
iii
C I S CO C C N P T S H O OT S I M P L I F I E D
Almost every topic is applied to how you would use the knowledge in real life, which is an area
you will find missing from almost every other Cisco textbook. We design, install, and troubleshoot
Cisco networks on a daily basis and have been doing so for many years. We don’t fill your head full
of useless jargon and fluff just to boast about how much we know. Although we do spend a little
time teaching, for the most part, we are Cisco consultants out in the field.
If you are a member of www.howtonetwork.net, then please use the tools, such as the flash cards,
practice exams, and videos, found on the site to help you through the exam. You will find them to
be vital study tools. Please use the CCNP TSHOOT discussion forum if you have any questions you
need help with. Farai and I monitor the forum on a daily basis.
Lastly, make sure you register your book at the link below in order to receive free updates on it for
life.
http://www.howtonetwork.net/public/2653.cfm
Best of luck with your studies. See you at the top.
Paul Browning
Farai Tafa
iv
C I S CO C C N P T S H O OT S I M P L I F I E D
ABOUT THE AUTHORS
Paul Browning
Paul Browning is the author of CCNA Simplified, which is one of the
industry’s leading CCNA study guides. Paul previously worked for
Cisco TAC but left in 2002 to start his own Cisco training company
in the UK. Paul has taught over 2,000 Cisco engineers with both his
classroom-based courses and his online Cisco training site, www.
howtonetwork.net. Paul lives in the UK with his wife and daughter.
Farai Tafa
Farai Tafa is a Dual CCIE in both Routing and Switching and Service
Provider. Farai currently works for one of the world’s largest telecoms companies as a network engineer. He has also written workbooks for the CCNA, CCNP, and Cisco Security exams. Farai lives in
Washington, D.C. with his wife and daughter.
v
C I S CO C C N P T S H O OT S I M P L I F I E D
TABLE OF CONTENTS
PART I: THEORY
1
Chapter 1: Network Monitoring and Maintenance
3
Network Maintenance Fundamentals Overview
4
Network Maintenance Tasks
5
An Overview of Network Management Models
9
IOS Maintenance and Monitoring Tools
15
Additional Maintenance and Monitoring Tools
52
Chapter Summary
56
Chapter 2: Troubleshooting Methodologies and Tools
63
Troubleshooting and the Troubleshooting Flow
64
Communication and Troubleshooting
67
Integrating Maintenance and Troubleshooting
68
Troubleshooting Methodologies
70
The Cisco IOS Generic Troubleshooting Toolkit
73
Additional Troubleshooting Tools
111
Chapter Summary
112
Chapter 3: Troubleshooting Switches at Layers 1 and 2
115
Troubleshooting at the Physical Layer
116
VLAN, VTP, and Trunking Overview
124
Troubleshooting VLANs
126
Using the ‘show vlan’ Command
134
Spanning Tree Protocol Overview
139
Troubleshooting Spanning Tree Protocol
143
Using the ‘show spanning-tree’ Command
155
Chapter Summary
160
Chapter 4: Troubleshooting Catalyst Switch Layer 3 Protocols, Supervisor Redundancy,
and Performance Issues
165
Catalyst Switch VLAN Interfaces Overview
166
Catalyst Switch MLS Overview
167
Troubleshooting Multilayer Switching
170
Understanding and Troubleshooting HSRP
171
Understanding and Troubleshooting VRRP
181
Understanding and Troubleshooting GLBP
188
Troubleshooting Switch Supervisor Redundancy
192
Troubleshooting Switch Performance Issues
195
Chapter Summary
199
vii
C I S CO C C N P T S H O OT S I M P L I F I E D
Chapter 5: Troubleshooting EIGRP
205
Enhanced Interior Gateway Protocol Overview
206
Troubleshooting Neighbor Relationships
212
Troubleshooting Route Installation
217
Troubleshooting Route Advertisement
221
Troubleshooting Stub Routing Issues
225
Troubleshooting SIA Issues
227
Troubleshooting Route Redistribution Issues
229
Debugging EIGRP Routing Issues
230
Chapter Summary
233
Chapter 6: Troubleshooting OSPF
237
Open Short Path First Protocol Overview
238
Troubleshooting Neighbor Relationships
249
Troubleshooting Route Advertisement
259
Troubleshooting Route Redistribution Issues
263
Troubleshooting Route Summarization
266
Debugging OSPF Routing Issues
268
Chapter Summary
270
Chapter 7: Troubleshooting BGP
273
Border Gateway Protocol Overview
274
Troubleshooting Neighbor Relationships
287
Troubleshooting Route Advertisement
292
Troubleshooting Route Redistribution Issues
299
Debugging BGP Routing Issues
301
Chapter Summary
305
Chapter 8: Troubleshooting Cisco IOS Security Features
309
Cisco IOS Security Fundamentals
311
Management Plane Security and Troubleshooting
315
Control Plane Security and Troubleshooting
329
Forwarding Plane Security and Troubleshooting
330
Cisco IOS Firewall Fundamentals
344
Chapter Summary
350
Chapter 9: Troubleshooting Cisco IOS DHCP and NAT
355
Understanding DHCP
356
Troubleshooting DHCP
368
Understanding NAT
373
Troubleshooting NAT
375
Chapter Summary
382
viii
C I S CO C C N P T S H O OT S I M P L I F I E D
Chapter 10: Troubleshooting IPv6 Routing & Interoperability
385
IP version 6 Protocol Overview and Fundamentals
387
Understanding and Troubleshooting EIGRPv6
399
Understanding and Troubleshooting RIPng
403
Understanding and Troubleshooting OSPFv3
411
Troubleshooting IPv6 Route Redistribution
420
IPv4 and IPv6 Interoperability
421
Troubleshooting IPv4 and IPv6 Interoperability
425
Chapter Summary
429
Chapter 11: Troubleshooting Cisco Wireless LAN Solutions
435
Wireless Local Area Network Overview
436
The Cisco WLAN Solution
441
Troubleshooting Cisco WLAN Solutions
444
Chapter Summary
460
Chapter 12: Troubleshooting Cisco VoIP and Video Solutions
465
Cisco IP Telephony Fundamentals
466
The Need for LAN and WAN Quality of Service
474
LAN and WAN IPT QoS Implementation
475
Cisco IP Video Fundamentals
490
LAN and WAN Video QoS Implementation
507
Troubleshooting Converged Networks
508
Chapter Summary
515
Chapter 13: Troubleshooting Branch Office Solutions
521
Cable and DSL Broadband Access Technologies
522
Site-to-Site VPN Technologies
532
Remote Access VPN Technologies
538
Troubleshooting Broadband Technologies
539
Troubleshooting Site-to-Site VPNs
555
Troubleshooting Remote Access VPNs
564
Chapter Summary
566
PART II: LABS
571
Lab 1: TSHOOT–Multi-Technology Troubleshoot
573
Lab 2: TSHOOT–Multi-Technology Troubleshoot
589
Lab 3: TSHOOT–Multi-Technology Troubleshoot
609
Lab 4: TSHOOT–Multi-Technology Troubleshoot
621
Lab 5: TSHOOT–Multi-Technology Troubleshoot
635
ix
PART 1
Theory
CHAPTER 1
Network Monitoring and
Maintenance
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
W
elcome to the CCNP TSHOOT certification exam study guide. The TSHOOT is the third
exam required to complete the updated CCNP certification. The objective of this course is to
provide you with a solid fundamental understanding of network maintenance and troubleshooting
methodology. In addition, this course will also discuss core technology troubleshooting in Cisco
IOS software for all CCNP Layer 2 and Layer 3 technologies and protocols. The core TSHOOT
exam objectives covered in this chapter are as follows:
•
•
•
•
Maintain and monitor network performance
Develop a plan to monitor and manage a network
Perform network monitoring using IOS tools
Perform routine Cisco IOS device maintenance
As a CCNP network engineer, not only is it important to understand how the relevant protocols
and technologies work and are applied or configured in Cisco IOS software, in addition to understanding how to troubleshoot these technologies and protocols, but also it is imperative to understand how to maintain internetworks. Well-maintained networks typically have fewer problems than those that are neglected. Additionally, well-maintained networks are easier to support
and troubleshoot, as they often have fewer problems than those that are not well-maintained. This
chapter will be divided into the following sections:
•
•
•
•
•
Network Maintenance Fundamentals Overview
Network Maintenance Tasks
An Overview of Network Management Models
IOS Maintenance and Monitoring Tools
Additional Maintenance and Monitoring Tools
NETWORK MAINTENANCE FUNDAMENTALS OVERVIEW
Network maintenance is an integral component of a network management methodology. While in
general network maintenance is assumed to be concerned primarily with repairs and upgrades, it
should be noted that a comprehensive network maintenance solution also includes corrective and
preventive measures, which allow for network optimization as well as the upkeep of network documentation, among other tasks, which are listed later in this chapter.
Network maintenance activities can be performed in a structured (scheduled) manner or on an asneeded (ad-hoc) basis. A structured or scheduled approach is based on a predefined plan and is the
recommended method for performing any network maintenance tasks. An example of structured
maintenance might be a scheduled change window to upgrade the software versions on internetwork devices, such as routers and switches.
4
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
As-needed, or ad-hoc, maintenance activities are those that are performed when any issues arise.
These maintenance activities, also referred to as interrupt-driven tasks, are unplanned tasks and
they typically cannot be predicted. An example of such a task would be replacing a failed Supervisor module or line card in a Cisco Catalyst 6500 series switch. Such maintenance activities are part
of the day-to-day management of the network and cannot be eliminated; however, a structured
maintenance approach can be used to mitigate their occurrences and their overall impact on the
network, as well as the organization in general.
A structured maintenance approach leverages proactive monitoring to detect and remedy any potential problems within the network itself or with internetwork devices. A proactive monitoring
solution not only provides the organization with the ability to identify and remedy problems before
they actually impact the production environment, but also it allows for capacity planning, which
includes network upgrades, expansions, or enhancements. This allows network maintenance activities to be planned, scheduled, and implemented in a controlled manner, greatly increasing the
overall probability of the success of those activities.
Without a structured maintenance approach in place, the majority of network maintenance tasks
are performed in a reactive manner, increasing the number of resources required to maintain and
support the network on a day-to-day basis, as well as increasing the likelihood of significant and
possible costly business impact (e.g., a network outage) at any given time.
NETWORK MAINTENANCE TASKS
Network maintenance tasks are those that network administrators perform on a day-to-day basis,
allowing for the upkeep of the network. Some of the more common network maintenance tasks
include, but are not limited to, the following general activities:
•
Installing, replacing, or upgrading both hardware and software
•
Monitoring, tuning, and optimizing the network
•
Documenting the network and maintaining network documentation
•
Securing the network from both internal and external threats
•
Planning for network upgrades, expansions, or enhancements
•
Scheduling backups and restoring services or the network from backups
•
Ensuring compliance with legal regulations and corporate policies
•
Troubleshooting problem reports
•
Maintaining and updating device configurations
5
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Installing, Replacing, or Upgrading Both Hardware and So ware
Hardware and software installation, replacement, and upgrades are very common network maintenance tasks. In a Cisco internetwork, this may include replacing older or failed hardware, such as
switch linecards and supervisor modules in Catalyst 4500 and 6500 series switches, for example, as
well as upgrading the Cisco IOS images to current revision or patch levels for routers and switches
alike.
Monitoring, Tuning, and Op mizing the Network
One of the core facilitators of an effective network maintenance solution or strategy is proactive
monitoring. Proactive monitoring allows potential problems to be detected and remedied before
they cause an outage or affect operation. Event logging and network monitoring can be used to
react proactively to network or system alerts and can be used to do the following:
•
Verify the performance of the network and all internetwork devices in the network
•
Baseline the performance of the network itself
•
Understand the amount of direction and traffic flows in the network
•
Identify and troubleshoot potential network issues
Documen ng the Network and Maintaining Network Documenta on
While most network engineers consider documentation a rather mundane and even lowly task, it
is important to understand that documentation is a critical component of network maintenance,
as well as troubleshooting and support. It is important to understand that different organizations
have different standards for acceptable levels of documentation. Following are several guidelines or
recommendations that you should adhere to when documenting the network:
•
Determine the scope of responsibility
•
Understand the objective
•
Maintain documentation consistency
•
Ensure that the documentation is easily accessible
•
Maintain the documentation
The first guideline is ensuring that you understand your scope of responsibility. That is, it is important to understand what it is for which you are responsible. For example, you may be working
in an organization that has a voice, security, storage, and network team all under the Information
Technology (IT) department umbrella. Rather than attempting to create documentation for all the
teams, you should ensure that you document only those networks and devices that are within your
administrative responsibility.
6
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
It is important to understand what the documentation will be used for. A common mistake that is
made by network engineers is including either too little or too much information within the documentation. Take time to understand fully what the document you are creating will be used for, and
take into consideration the audience the document is targeted to and what information would be
useful or excessive for that particular group. Over-documentation makes documented information hard to understand. On the other hand, under-documentation makes network support and
troubleshooting difficult to perform.
Consistency when creating network documentation is a key component that should be adhered to
as much as possible. In most organizations, design and documentation templates are available for
reference when creating new documentation. Maintaining consistency increases the usability of
those documents and makes them easier to understand for everyone else.
No matter how great the documentation is, it helps no one if those who may need the documentation to support troubleshooting or support functions cannot access it. Where possible, documentation should be stored in a location that is readily and easily accessible to all who may use it, such
as on a secure network or share location. In some cases, depending on the organization, it may be
necessary for documentation to be stored in a secured, offsite location for disaster recovery and
business continuity purposes.
Finally, once the documentation has been created, it is important to ensure that it is always maintained and up to date. Network diagrams from years ago may contain misleading and incorrect
information that may hamper troubleshooting information. Network documentation should be
considered living documentation that changes at the same rate as the network. Following the completion of each network project, existing documentation should be updated to reflect the changes
that were made to the network.
Although there are no standards that determine what information should and should not be included in network documentation, most organizations and businesses have their own standards
for what should be included in the network documentation. It is important to adhere to these standards and guidelines when creating documentation. From a best practices perspective, network
documentation should include the following information, at a minimum:
•
Information about the interconnects between devices for LAN and WAN connections
•
IP addressing and VLAN information
•
A physical topology diagram of the network
•
A logical topology diagram of the network
•
An inventory of all internetwork devices, components, and modules
7
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
A revision control section detailing changes to the topology
•
Configuration information
•
Any original or additional design documentation and notes
•
Data or traffic flow patterns
Securing the Network from Both Internal and External Threats
Network security is an integral component of network operation and maintenance. It is also very
important that consideration be given to both internal and external threats. While most organizations have a dedicated security team, monitoring and a structured maintenance approach can also
be used to discover vulnerabilities or potential security threats, which can then allow the appropriate action to be taken before an incident occurs.
Planning for Network Upgrades, Expansions, or Enhancements
Using network monitoring, you can identify potential issues before they arise, as well as plan for
possible network upgrades or expansions (i.e., capacity planning) based on the identified potential
issues. Effective planning can be used to define the maintenance tasks required on the network, and
then to prioritize those tasks and the order in which they will be implemented.
Scheduling Backups and Restoring Services or the Network from Backups
Backups are a routine maintenance task that is usually given a very low priority. However, it is important to understand the importance of backups – especially when attempting to recover from a
serious or critical failure of the network. Backups should therefore be considered a core common
network maintenance task and should be allocated a high priority. It is important to ensure that
backups of core network components and devices are scheduled, monitored, and verified at all
times. Having up-to-date backups of core devices can assist in faster recovery of the network or
individual network components following hardware or software failures, or even data (configuration) loss.
Ensuring Compliance with Legal Regula ons and Corporate Policies
A structured network maintenance methodology also ensures that the network is compliant with
both legal obligations and corporate policies. Regulatory policies, which are mandatory enforcements of compliance with industry regulations and laws, will differ for businesses. Regardless of the
industry and the requirements, it is important to ensure that the business is following the industry
standards as regulated by law. Unlike legal regulations, corporate policies will vary on a businessby-business basis; however, it is still important to ensure that the network adheres to these policies
and can provide the required functions.
8
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
Troubleshoo ng Problem Reports
Troubleshooting problem reports is a core network maintenance function. While troubleshooting
methodologies are described in detail later in this guide, troubleshooting is simplified by a structured network maintenance approach, which includes documentation, backups, and some form of
proactive monitoring system.
Maintaining and Upda ng Device Configura ons
Configuration changes are common because of the day-to-day moves, additions, or changes
(MACs) within organizations. Device configurations may also change due to scheduled maintenance tasks and planned changes to the network. For this reason, maintaining and updating device
configurations is considered a core network management function. Each time configurations on
devices change, they not only should be documented but also should be saved, both on the device
and to an alternate backup location (e.g., an FTP or TFTP server, if one is available).
AN OVERVIEW OF NETWORK MANAGEMENT MODELS
There are several network management methodologies or models that incorporate network maintenance activities. It is important to understand that these models are guidelines, not standards.
Standards can be defined as industry-recognized best practices, frameworks, and agreed-upon
principles or concepts and designs, which are designed to implement, achieve, and maintain the required levels of processes and procedures. Guidelines, on the other hand, are simply recommended
actions and operational guides for users. Unlike standards, which are mandatory, guidelines are
used simply as reference material. Common network management models, which are described in
the following section, include the following:
•
Telecommunications Management Network
•
FCAPS
•
Information Technology Infrastructure Library
•
Cisco Lifecycle Services
Telecommunica ons Management Network (TMN)
The Telecommunications Management Network (TMN) framework is a model defined by ITU-T
for managing open systems in a communications network. It is referenced in ITU-T Recommendation M.3010. It is important to understand that the TMN was developed to provide a framework
for service providers to manage their service delivery network; however, the same basic concepts
can also be applied to standard enterprise networks. This framework defined four management
architectures at different levels of abstraction as follows:
9
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
A functional architecture
•
An informational architecture
•
A physical architecture
•
A logical layered architecture
NOTE: While delving into the specifics on these various architectures is beyond the scope
of the TSHOOT certification exam, the following section provides a brief description of each.
The functional architecture describes various management functions. Next, the informational architecture describes concepts that have been adopted from OSI management. The physical architecture defines how these management functions may be implemented into physical equipment,
and, finally, the logical layered architecture includes a model that shows how management can be
structured according to different responsibilities.
Within the logical layer, the framework also provides a common methodology and logic that is
applicable to the management of private enterprise networks by introducing an additional four
abstract layers of management functionality, which are as follows:
•
The Business Management Layer
•
The Service Management Layer
•
The Network Management Layer
•
The Element Management Layer
NOTE: Again, while delving into detail on these layers is beyond the scope of the TSHOOT
certification exam, the following section provides a brief description of each.
The Business Management Layer (BML) has a broad scope, which includes responsibility for the
management of the whole enterprise. This layer is more aligned with strategic management, rather
than day-to-day operational management.
The Service Management Layer (SML) is concerned with management of those aspects that may
be directly observed by the users of the telecommunication network. These users may be end users
(customers) or other service providers (administrations). Examples of functions that are performed
at the Service Management Layer include administration, accounting, the addition and removal
of users, and QoS management. The BML and SML provide the link between IT and the business.
The Network Management Layer (NML) deals with fault and performance data for the network,
as well as overall network management and configuration, which includes tasks such as network
10
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
monitoring and fault detection, optimization, and configuration changes, for example. It is important to understand that this layer pertains to the overall network. Individual device management is
covered at the Element Management Layer.
The Element Management Layer (EML) deals with configuration management, fault, and performance at the device level. This layer deals with vendor-specific management functions and hides
these functions from the layer above, the Network Management Layer. Examples of the functions
that are performed at this layer include alarm management, handling of information, backup, logging, maintenance of hardware and software, and measuring resource utilization (e.g., CPU, memory, and power consumption).
FCAPS
FCAPS is the International Organization for Standardization (ISO) TMN model and framework
for network management. The acronym FCAPS stands for Fault management, Configuration management, Accounting management, Performance management, and Security management. These
are the network management categories used by the ISO. The following sections describe the five
functional areas in the FCAPS model.
Fault management is a lifecycle-centered management that revolves around identifying problems
through continuous monitoring of the entire network, correlating the fault data, and isolating the
problem to the source. The overall fault management lifecycle includes the following tasks:
•
Fault and problem detection
•
Handling and acknowledging alarms sent by devices
•
Fault and problem isolation using a filtration and correlation process
•
Fault correction and recovery
•
Tracking problems through resolution via a trouble-ticketing system
Configuration management encompasses the management of actual device configurations, the
configuration change control process (which may include the commissioning and decommissioning of network devices), backing up and restoring configurations, and overall workflow management for the administrators performing the configuration changes. Another important aspect is
the ability to track and log changes to device configurations.
Accounting management covers methods to track usage statistics and costs associated with time
and services provided with devices and other network resources. Accounting information, such as
link utilization and device resource utilization, can also be used for Service Level Agreement (SLA)
purposes, ensuring that an agreed-upon level of service is being provided.
11
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Performance management covers the tracking of system and network statistics using a Network
Management Station (NMS). The data that is collected may include link utilization, errors, response times, and availability information. This data can then be used to improve performance for
critical traffic, such as Voice over IP (VoIP), by implementing or adjusting Quality of Service (QoS)
solutions to make the most efficient use of limited bandwidth. Additionally, performance management monitoring can also be used to establish thresholds and identify network trends, all of which
provide valuable data for capacity planning.
Security management addresses access rights that include authentication and authorization, data
privacy, and auditing security violations. From a network administration perspective, security
management is concerned primarily with controlling access to network devices using, for example,
the Authentication, Authorization, and Accounting (AAA) security architecture. However, security
management may also include integrating firewalls and other security devices, such as an Intrusion
Prevention System (IPS), into the network to protect against viruses, worms, and other malicious
types of traffic.
Informa on Technology Infrastructure Library (ITIL)
Information Technology Infrastructure Library (ITIL) is a set of best practices for Information
Technology Service Management (ITSM), IT development, and IT operations. The names ITIL and
IT Infrastructure Library are registered trademarks of the United Kingdom’s Office of Government
Commerce (OGC). ITIL provides businesses with a customizable framework of best practices that
can be used to ensure and achieve quality service, as well as to overcome some of the difficulties
that are associated with the growth of IT systems.
ITIL is organized into a set of texts that are defined by related functions. ITILv3, which is the current (latest) version, defines five processes that cover the entire lifecycle of an IT project, from
starting its architectural planning, spanning through the design and implementation phases, and
covering the operational phases to a continued loop with the services optimization. The five processes or sets defined in ITILv3 are as follows:
•
Service Strategy
•
Service Design
•
Service Transition
•
Service Operation
•
Continual Service Improvement
Service Strategy is both the center and origin point of the ITIL Service Lifecycle. Service Strategy focuses on the strategic approach to IT Service Management. The purpose of Service Strategy is to devel-
12
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
op the organizational ability to think and act in a strategic manner and transform IT Service Management into a strategic asset. This includes consideration of the services that should be offered, to whom
those services should be offered, how service performance will be measured, and how the customer
and stakeholders will perceive and measure the value of the services, among other considerations.
Service Design is part of the overall business change process and is concerned primarily with the
design of new or changed services for introduction into the live environment. This includes their
architectures, processes, policies, and documentation. In essence, Service Design simply translates
the strategic objectives into portfolios of service and assets. Service Design covers the following
lifecycle management aspects:
•
Service Level Management (SLM)
•
Service Level Agreements (SLAs)
•
Operational Level Agreements (OLAs),
•
Service Improvement Plan
•
Service Quality Plan
•
Capacity Management
•
Availability Management
•
IT Service Continuity Management
•
Supplier Management
•
Compliance Management
•
IT Architecture Management
•
Risk Management
Service Transition relates to the delivery of services required by a business into operational use.
Service Transition includes the following list of processes and activities:
•
Service Asset and Configuration Management
•
Service Validation and Testing
•
Evaluation
•
Release Management
•
Change Management
•
Knowledge Management
Service Operation is the stage in the ITIL core lifecycle whose primary purpose is to deliver and
support IT services at agreed-upon levels, and to manage the applications, technology, and infrastructure that support the delivery of the services. Service Operation entails the delivery of these
services to both users and customers and includes the following processes:
13
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
•
•
•
•
Event Management
Incident Management
Problem Management
Request Fulfillment
Access Management
Continual Service Improvement (CSI) is an activity that is part of everyday life in IT services. It is
not an emergency project that is initiated when someone in authority yells that the network service
or performance is sub-par or poor. Instead, CSI is an ongoing way of life that entails continually
reviewing, analyzing, and improving service management processes and is the service that allows
you to address changes in business requirements, refresh technology cycles, and ensure that high
quality is maintained.
Cisco Lifecycle Services (PPDIOO)
The Cisco PPDIOO model encompasses all steps from network vision to optimization, which enables Cisco to provide a broader portfolio of support and end-to-end solutions to its customers.
This Cisco lifecycle model includes the stages of prepare, plan, design, implement, operate, and
optimize; hence, the acronym PPDIOO.
Within the PPDIOO model, the prepare phase deals with network discovery to understand business needs, high-level requirements, and any potential challenges. At the end of this stage, a conceptual architecture of the proposed network solution is then presented.
The plan phase of the Cisco PPDIOO model compares the existing network with the proposed
network (i.e., the proposed solution in the prepare phase) to help identify tasks, responsibilities,
milestones, and resources required to implement the design.
The design phase of the Cisco PPDIOO model articulates the detailed design requirements. In this
phase, a low-level design, which will eventually be implemented, is designed. Considerations during this phase should also include how to meet requirements for the applications, support, backup,
and recovery.
The PPDIOO implement phase addresses the integration of new equipment into the existing network environment based on the design requirements. Supporting implementation documentation
(e.g., an implementation and back-out plan) is also presented during this phase.
The PPDIOO operate phase begins after the new device(s) have been implemented or integrated
into the existing network. This phase of the PPDIOO model entails the day-to-day network operation, while responding to any issues that arise.
14
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
Finally, the optimize phase of the PPDIOO model continually gathers the feedback from the operate phase to make potential adjustments to the existing network, which typically results in another
project beginning with the prepare phase. The entire process is then repeated again. During the
optimize phase, feedback received from the operate phase may also be used to address any ongoing
network performance and support issues.
IOS MAINTENANCE AND MONITORING TOOLS
Cisco provides a plethora of tools that can be used for maintenance and monitoring through the
Command Line Interface (CLI), as well as through a Graphical User Interface (GUI) for devices
running Cisco IOS software. In addition to these IOS-based tools, Cisco also provides standalone
tools that can be used for network maintenance, monitoring, and troubleshooting. The following
sections describe some of the Cisco maintenance and monitoring tools that you should be familiar
with for the purposes of the TSHOOT certification exam.
Enhanced Embedded Event Manager
The enhanced Embedded Event Manager (EEM) is part of the Cisco Embedded Automation Systems (EASy) toolkit, which combines the following additional embedded management technologies with EEM:
•
Cisco IP Service Level Agreements (IP SLAs)
•
Expression MIB
•
Network-Based Application Recognition (NBAR)
•
Flexible NetFlow
•
Enhanced Object Tracking
•
Cisco IOS Shell (IOS.sh)
NOTE: Cisco IP SLAs will be described in detail later in this chapter. The Cisco Expression
MIB is beyond the scope of the TSHOOT certification exam, as is Cisco IOS Shell (IOS.sh).
These will not be described in any further detail in this chapter or in the remainder of this
guide. NBAR and NetFlow are both core components of the TSHOOT exam and these will
be described in detail later in this chapter and throughout this guide. Finally, Enhanced Object Tracking (EOT) is described in detail in both the ROUTE and SWITCH guides. From a
troubleshooting perspective, you do not have to know any additional information on EOT.
Cisco IOS Embedded Event Manager (EEM) is part of the maintenance and monitoring toolkit.
EEM is a powerful and flexible subsystem that provides real-time network event detection and
onboard automation. EEM also increases the intelligence of network devices, allowing them to act
on and facilitate management actions for specific network events.
15
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
A series of event detector processes designed to monitor explicit operational aspects of routers or
switches are built into Cisco IOS Software. These can be primed to look for a specific event, and
when that event occurs, they can act as a trigger to start up a user-loaded script. These customizable scripts are programmed using either simple Command Line Interface (CLI) commands or
Tool Command Language (Tcl).
A common use for EEM is printing a message on the console following a certain action that has
been performed on the router. For example, EEM can be configured so that when someone issues
on a device the clear counters, clear ip route * , or clear ip bgp * command, for example,
a message is printed on the console requesting that the person update relevant network documentation on why this action was taken on the device.
In addition, EEM could also be used to print out a message requesting that the network documentation be changed or configurations be saved following changes to a device. These examples are
simple examples of the capabilities of EEM, because EEM can be configured to perform actions
such as send an e-mail out advising of monitored events.
When using the CLI to configure EEM, you must first configure an EEM applet. The EEM applet
is a simple form of policy that is defined within the CLI configuration using the event manager
applet [name] global configuration command. After you have configured the EEM applet, the
router then transitions to EEM applet configuration mode. This configuration mode supports three
commands, which are the event, action, and set commands.
The event command is used to specify the event criteria that will trigger the EEM applet to run.
For example, the event could be a syslog message indicating that counters have been cleared on an
interface or the issuing of certain CLI commands, such as clear ip route *.
The action command is used to specify an action to perform when the EEM applet has been triggered. Multiple sequential action commands can be configured within the applet. For example,
you can specify that the first action that will be taken after a user has exited configuration mode is
that a message will be printed on the console, and then the next action can issue a CLI command to
save the configuration on the local device or to a TFTP server.
Finally, the set command is used to set the value of an EEM applet variable. Following the configuration, the show event manager policy registered command can then be used to display a
list of registered applets.
The following configuration example shows how to configure a basic EEM applet using the CLI.
16
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
This applet will be triggered when the syslog pattern ‘%SYS-5-CONFIG_I:’ is logged by the router.
When triggered by this event, the applet will print a syslog message that reads “Please Update Network Documentation”, followed by another message that reads “Please Save The Configuration”.
This simple configuration is implemented on the router as follows:
R1(config)#event manager applet CONFIGURATION-CHANGE-APPLET
R1(config-applet)#event syslog pattern %SYS-5-CONFIG_I:
R1(config-applet)#action 1.0 syslog msg “Please Update Network Documentation”
R1(config-applet)#action 1.1 syslog msg “Please Save The Configuration”
R1(config-applet)#exit
As previously stated in this section, after one or more applets have been configured, the show
event manager policy registered command is used to display a list of registered applets. The
output of this command following R1’s EEM configuration is illustrated below:
R1#show event manager policy registered
No. Class
Type Event Type Trap Time Registered
Name
1
applet user syslog
Off
Sun Mar 3 02:41:18 2002
CONFIGURATIONCHANGE-APPLET
pattern {%SYS-5-CONFIG_I:}
action 1.0 syslog msg “Please Update Network Documentation”
action 1.1 syslog msg “Please Save The Configuration”
As a simple test, this configuration can be validated by entering and exiting configuration mode on
the router as illustrated below:
R1#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#
R1(config)#
R1(config)#end
R1#
R1#
*Mar 3 02:46:50.493: %SYS-5-CONFIG_I: Configured from console by console
*Mar 3 02:46:50.501: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: Please Update
Network Documentation
*Mar 3 02:46:50.505: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: “Please Save
The Configuration”
The same EEM applet would also be executed for changes made remotely (e.g., via a Telnet session)
as illustrated in the following output:
R2#telnet 10.0.0.1
Trying 10.0.0.1 ... Open
User Access Verification
17
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Username: netadmin
Password:
R1>enable
Password:
R1#terminal monitor
R1#config t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#
R1(config)#
R1(config)#end
R1#
R1#
R1#
*Mar 3 02:52:23.643: %SYS-5-CONFIG_I: Configured from console by netadmin on
vty0 (10.0.0.2)
*Mar 3 02:52:23.651: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: “Please Update
Network Documentation”
*Mar 3 02:52:23.651: %HA_EM-6-LOG: CONFIGURATION-CHANGE-APPLET: “Please Save
The Configuration”
NOTE: The terminal monitor command must be issued if you want to see log messages
on the screen when you remotely access a device. Otherwise, you must use the show logging
command to view the messages in the router or switch logs.
As a final example, the following configuration output illustrates how to configure multiple EEM
applets to log and print messages when the clear counters or clear ip bgp * commands are
issued. When the clear counter command is issued, the message “Please advise Network Operations why the interfaces counters were cleared by sending an email to netops@howtonetwork.
net. Thank you!” When the command clear ip bgp * is issued, the message “This operation is
NOT allowed! Please contact netops@howtonetwork.net for permission to perform this operation. Thank you!” is printed and the command is rejected:
R1(config)#event manager applet CLEAR-INTERFACE-COUNTERS-APPLET
R1(config-applet)#event cli pattern “clear counters.*” sync no skip no
R1(config-applet)#$ sending an email to netops@howtonetwork.net. Thank you!”
R1(config-applet)#exit
R1(config)#event manager applet CLEAR-IP-BGP-APPLET
R1(config-applet)#event cli pattern “clear ip bgp.*” sync no skip yes
R1(config-applet)#$ for permission to perform this operation. Thank you!”
R1(config-applet)#exit
This configuration can be validated using the show event manager policy registered command, as was also illustrated in the previous example. Following is the output of this command
after the EEM configuration on R1:
18
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
R1#show event manager policy registered
No. Class
Type
Event Type Trap Time Registered
Name
1
applet user
cli
Off
Sun Mar 3 03:16:54 2002
CLEARINTERFACE-COUNTERS-APPLET
pattern {clear counters.*} sync no skip no
action A syslog msg “Please advise Network Operations why the interface
counters were cleared by sending an email to netops@howtonetwork.net. Thank
you!”
2
applet user
cli
Off
Sun Mar 3 03:15:53 2002
CLEAR-IP-BGPAPPLET
pattern {clear ip bgp.*} sync no skip yes
action A syslog msg “This operation is NOT allowed! Please contact netops@
howtonetwork.net for permission to perform this operation. Thank you!”
NOTE: The sync keyword is used to determine whether CLI and EEM policy execution will
be either synchronous (at the same time) or asynchronous (one after the other). The skip
keyword is used to specify whether the command will be executed or run. Going into these
advanced options is beyond the scope of the TSHOOT certification exam; however, they have
been included in this section to demonstrate further the capabilities of EEM.
The EEM configuration on R1 can be tested by issuing the clear counters command on the
router and then the clear ip bgp * command as follows:
R1#clear counters
Clear “show interface” counters on all interfaces [confirm]
R1#
*Mar 3 03:39:18.317: %HA_EM-6-LOG: CLEAR-INTERFACE-COUNTERS-APPLET: “Please
advise Network Operations why the interfaces counters were cleared by sending an
email to netops@howtonetwork.net. Thank you!”
*Mar 3 03:39:19.191: %CLEAR-5-COUNTERS: Clear counter on all interfaces by
netadmin on vty0 (10.0.0.2)
R1#
R1#
R1#
R1#
R1#clear ip bgp *
R1#
*Mar 3 03:27:12.235: %HA_EM-6-LOG: CLEAR-IP-BGP-APPLET: “This operation is NOT
allowed! Please contact netops@howtonetwork.net for permission to perform this
operation. Thank you!”
With the execution of the clear ip route * command, the EEM applet prints the stated message
and allows the command to be executed. When the clear ip bgp * command is issued, the EEM
applet again prints the stated message but this time does not allow the command to be executed. Such
configurations ensure that only people authorized to make changes to reset neighbor relationships,
etc., are allowed to do so, and do so only when the proper controls and notifications are in order.
19
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
NOTE: You are not required to implement any Cisco IOS EEM configurations in the current
TSHOOT exam; however, you should be familiar with the basic EEM configuration logic.
Cisco IOS IP Service Level Agreement
Cisco IOS IP SLA, which is described in detail in both the ROUTE and SWITCH certification
guides, allows you to monitor, analyze, and verify IP service levels for IP applications and services,
to increase productivity, to lower operational costs, and to reduce occurrences of network congestion or outages. IP Service Level Agreement (IP SLA) uses active traffic monitoring to measure
network performance, allowing IP SLAs to be used not only for maintenance and monitoring functions but also for troubleshooting as well as to baseline network performance.
IP SLA can measure and monitor network performance metrics such as jitter, latency (delay), and
packet loss. IP SLA has evolved with advanced measurement features, such as application performance, MPLS awareness, and enhanced voice measurements. IP SLA uses active traffic monitoring, which is the generation of traffic in a continuous, reliable, and predictable manner, for measuring network performance edge-to-edge over a network. Given this, IP SLA operations are based on
active probes because synthetic network traffic is generated strictly for the purpose of measuring a
network performance characteristic of the defined operation.
NOTE: A passive probe is one that captures actual network traffic flows for analysis. Examples
would be a packet capture (e.g., Ethereal or Wireshark) and NetFlow.
IP SLA is comprised of two components, which are the source (agent) and the target. The source, or
agent, is where IP SLA operations are defined. In other words, this is where the bulk of the configuration is implemented. Based on the configuration parameters, the source generates packets specific
to the defined IP SLA operations, and analyzes the results and records it so that it can be accessed
through the CLI or via Simple Network Management Protocol (SNMP). SNMP is described in detail
in the SWITCH guide. It will also be described briefly later in this chapter and throughout this guide.
A source router can be any Cisco router or switch that can support the IP SLA operation being
configured. A particular source or agent can have multiple IP SLA tests running to many remote
responders. In addition, a particular router or switch can be both an agent and a responder for different IP SLA configurations.
The IP SLA target depends upon the type of IP SLA operation defined and may be a computer or
an internetwork device, such as a router or a switch. For example, for IP SLA FTP or HTTP operations, the target would be an FTP or HTTP server. For Routing Table Protocol (RTP) and UDP jitter
(VoIP), the target must be a Cisco device.
20
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
If the target is a Cisco device, the ip sla responder global configuration command must be
configured on this device because both the source and target participate in the performance measurement. The IP SLA responder has an added benefit of accuracy because it inserts in and out
time-stamps in the packet payload and therefore measures the CPU time spent.
The IP SLA responder (target) is a Cisco IOS software component that is configured to respond to
IP SLA request packets. The IP SLA source establishes a connection with the target using control
packets before the configured IP SLA operation begins.
Following the acknowledgement of the control packets, the source then sends the responder test
packets. The responder inserts a time-stamp when it receives a packet and factors out the destination processing time and adds time-stamps to the sent packets. This allows for the calculation of
unidirectional packet loss, latency, and jitter measurements with the kind of accuracy that is not
possible using simple ping tests or other dedicated (passive) probe testing.
Cisco IOS IP SLA operations can be broadly categorized into the following five functional areas:
1. Availability monitoring
2. Network monitoring
3. Application monitoring
4. Voice monitoring
5. Video monitoring
Availability monitoring can be used to monitor network-level availability and is performed primarily using ICMP and UDP packets. IP SLA availability monitoring operations are described in
detail in the following section. Network monitoring is used to monitor Layer 2 operations, such
as Asynchronous Transfer Mode, Frame Relay, and Multiprotocol Label Switching. Application
monitoring is used to monitor common network applications, which include HTTP, FTP, DHCP,
and DNS. Voice monitoring is used to determine voice quality scores, Post Dial Delay (PDD), Real
Time Protocol (RTP), and gatekeeper registration delay. Video monitoring is used to monitor video
traffic. There is no specific IP SLA test for video monitoring; however, the UDP jitter operation can
be used to simulate some video traffic.
IP SLA operations are configured in global configuration mode. The configuration of the IP SLA
feature depends on the software version running on the router. In Cisco IOS software versions
12.3(14)T, 12.4, 12.4(2)T, and 12.2(33)SXH, IP SLA is configured using the ip sla monitor [operation number] global configuration command. In Cisco IOS 12.4(4)T and later, IP SLA is con-
figured using the ip sla [operation number] global configuration command.
21
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The [operation number] used in all three variations of IP SLA configuration is an integer between 1 and 2147483647. This allows for the configuration of multiple IP SLA operations on the
same device. Following IP SLA configuration in global configuration mode, the router transitions
to IP SLA monitor configuration mode.
In Cisco IOS software versions 12.3(14)T, 12.4, 12.4(2)T, and 12.2(33)SXH, the IP SLA operation is
configured using the type IP SLA monitor configuration command. The type command is used
to specify the packet type to send. This may be TCP connect packets or even UDP echo or ICMP
echo packets, depending on the operation being configured.
Commonly used additional parameters that are specified when configuring Cisco IOS IP SLA operations are timeout and frequency. The timeout keyword is used to specify the amount of time
for which the Cisco IOS IP SLAs operation waits for a response from its request packet. For example, when configuring an IP SLA operation that sends ICMP echo packets (pings) to a remote
destination, you can use the timeout keyword to specify the amount of time the operation will wait
before a response is received and before the operation is considered to be unsuccessful (i.e., fails).
The timeout value is specified in milliseconds. The default timeout value varies depending on the
type of IP SLA operation you are configuring.
The frequency is specified in seconds and is used to specify the rate at which a specified Cisco IOS
IP SLA operation is sent into the network. For example, if you specify a frequency of 10 when using
the ICMP echo operation, ping packets will be sent every 10 seconds. When configuring the frequency, it is important to understand and remember that the lower the value specified, the greater
the overhead on the router or switch sending out the packets.
After configuring the IP SLA operation and specifying additional parameters, the operation can
then be enabled using the ip sla monitor schedule [operation-number] global configuration command. While this command can be used with several parameters, parameters typically
used when configuring IP SLA for use with the reliable static routing backup using object tracking
include the life keyword and the start-time keyword.
The life keyword is used to specify the length of time to execute the operation. The life can be
specified in seconds (up to 2147483647) or infinitely using the forever keyword. The start-time
keyword is used to specify when the operation should begin. The most common implementation is
to use the now keyword to begin the operation immediately. However, the operation can be configured to start at a specified time, after a specified amount of time, or on a specific date at a specific
time, for example.
22
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
NOTE: After configuring and starting the Cisco IOS IP SLA operation(s), the results are stored
on the source device in the Cisco RTTMON MIB. This same MIB can also be used to configure
IP SLA operations using SNMP set commands. No explicit IP SLA operation configuration is
required to begin storing data in the Cisco RTTMOM MIB. Once the IP SLA Monitor has been
successfully created and scheduled, you can create IP SLA Performance Reports using tools
such as Denika SNMP Performance Trender. Keep in mind that SNMP must be configured on
the device. Basic SNMP configuration is described and illustrated later in this chapter.
The following configuration example illustrates how to configure IP SLA operations to measure the
response time it takes to perform a TCP Connection operation between the router and the remote
Web server with the IP address 10.0.0.2. The IP SLA operation timeout value will be set to 5 seconds, and the probe will be run every 10 seconds:
R1(config)#ip sla monitor 1
R1(config-sla-monitor)#type tcpConnect dest-ipaddr 10.0.0.2 dest-port 80
R1(config-sla-monitor-tcp)#timeout 5
R1(config-sla-monitor-tcp)#frequency 10
R1(config-sla-monitor-tcp)#exit
R1(config)#ip sla monitor schedule 1 start-time now life forever
In Cisco IOS versions Cisco IOS 12.4(4)T and later, the same configuration would be implemented
on the router as follows:
R1(config)#ip sla 1
R1(config-sla-monitor)#tcp-connect 10.0.0.2 80
R1(config-sla-monitor-tcp)#timeout 5
R1(config-sla-monitor-tcp)#frequency 10
R1(config-sla-monitor-tcp)#exit
R1(config)#ip sla monitor schedule 1 start-time now life forever
As a final example, the following configuration illustrates how to configure a basic IP SLA jitter operation to destination IP address 10.0.0.2 with a destination port of 32768. Finally, the IOS IP SLA
operation is scheduled to run every 30 seconds:
R1(config)#ip sla monitor 1
R1(config-sla-monitor)#type jitter dest-ipaddr 10.0.0.2 dest-port 32768
R1(config-sla-monitor-jitter)#frequency 30
R1(config-sla-monitor-jitter)#exit
R1(config)#ip sla monitor schedule 1 start-time now life forever
Following the IP SLA operation configuration, the show ip sla monitor statistics [operation number] command can be used to view the operation’s statistics as follows:
R1#show ip sla monitor statistics 1
Round trip time (RTT)
Index 1
Latest RTT: 3 ms
23
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Latest operation start time: *03:19:50.110 UTC Sun Mar 3 2002
Latest operation return code: OK
RTT Values
Number Of RTT: 10
RTT Min/Avg/Max: 3/3/4 ms
Latency one-way time milliseconds
Number of one-way Samples: 0
Source to Destination one way Min/Avg/Max: 0/0/0 ms
Destination to Source one way Min/Avg/Max: 0/0/0 ms
Jitter time milliseconds
Number of SD Jitter Samples: 9
Number of DS Jitter Samples: 9
Source to Destination Jitter Min/Avg/Max: 0/0/0 ms
Destination to Source Jitter Min/Avg/Max: 0/1/1 ms
Packet Loss Values
Loss Source to Destination: 0
Loss Destination to Source: 0
Out Of Sequence: 0
Tail Drop: 0
Packet Late Arrival: 0
Voice Score Values
Calculated Planning Impairment Factor (ICPIF): 0
Mean Opinion Score (MOS): 0
Number of successes: 6
Number of failures: 4
Operation time to live: Forever
If the target or destination device is a Cisco IOS router or switch that has been configured with the
ip sla monitor responder global configuration command, you can use the show ip sla monitor responder command on that device to view the local IP SLA operation’s statistics:
R2#show ip sla monitor responder
IP SLA Monitor Responder is: Enabled
Number of control message received: 10
Recent sources:
10.0.0.1 [11:07:41.536 UTC Sat
10.0.0.1 [11:07:11.534 UTC Sat
10.0.0.1 [11:06:41.533 UTC Sat
10.0.0.1 [11:06:11.536 UTC Sat
10.0.0.1 [11:05:41.535 UTC Sat
Number of errors: 0
Mar
Mar
Mar
Mar
Mar
2
2
2
2
2
2002]
2002]
2002]
2002]
2002]
Finally, if SNMP is also configured on the local device, the data can then be collected using SNMP
and can be used for the creation of reports on network performance. SNMP configuration is described later in this chapter. The primary emphasis in this section is to understand the logic behind,
as well as the capabilities of, Cisco IOS IP SLA operations and how they are an integral part of the
maintenance and monitoring toolkit.
24
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
Logging
Logging messages and events both locally and to a syslog server is a core maintenance task. Syslog
is a protocol that allows a host to send event notification messages across IP networks to event
message collectors – also known as syslog servers or syslog daemons. In other words, a host or a
device can be configured in such a way that it generates a syslog message and forwards it to a specific syslog daemon (server).
A syslog daemon or server is an entity that listens to the syslog messages that are sent to it. You cannot configure a syslog daemon to ask a specific device to send it syslog messages. In other words,
if a specific device has no ability to generate syslog messages, then a syslog daemon cannot do
anything about it. In the real world, corporations typically use SolarWinds (or similar) software for
syslog capturing. Additionally, freeware such as the Kiwi Syslog daemon is also available for syslog
capturing.
Syslog uses User Datagram Protocol (UDP) as the underlying transport mechanism, so the data
packets are unsequenced and unacknowledged. While UDP does not have the overhead included
in TCP, this means that on a heavily used network, some packets may be dropped and therefore
logging information will be lost. However, Cisco IOS software allows administrators to configure
multiple syslog servers for redundancy. A syslog solution is comprised of two main elements: a
syslog server and a syslog client.
The syslog client sends syslog messages to the syslog sever using UDP as the Transport Layer protocol, specifying a destination port of 514. These messages cannot exceed 1,024 bytes in size; however, there is no minimum length. All syslog messages contain three distinct parts: the priority, the
header, and the message.
The priority of a syslog message represents both the facility and the severity of the message. This
number is an 8-bit number. The first 3 least significant bits represent the severity of the message
(with 3-bits, you can represent 8 different Severities) and the other 5-bits represent the facility. You
can use these values to apply filters on the events in the syslog daemon.
NOTE: Keep in mind that these values are generated by the applications on which the event is
generated, not by the syslog server itself.
The values sent by Cisco IOS devices are listed and described below in Table 1-1:
25
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Table 1-1. Cisco IOS Software Syslog Priority Levels and Definitions
Level Level Name
Syslog Definition
Description
0
Emergencies
LOG_EMERG
1
Alerts
LOG_ALERT
2
Critical
LOG_CRIT
3
Errors
LOG_ERR
4
Warnings
LOG_WARNING
5
Notifications
LOG_NOTICE
6
Informational
LOG_INFO
7
Debugging
LOG_DEBUG
This level is used for the most severe error
conditions, which render the system unusable.
This level is used to indicate conditions that need
immediate attention from administrators.
This level is used to indicate critical conditions,
which are less than Alerts but still require
administrator intervention.
This level is used to indicate errors within the
system; however, these errors do not render the
system unusable.
This level is used to indicate warning conditions
about system operations that did not complete
successfully.
This level is used to indicate state changes within
the system (e.g., a routing protocol adjacency
transitioning to a down state).
This level is used to indicate informational messages about the normal operation of the system.
This level is used to indicate real-time
(debugging) information that is typically used for
troubleshooting purposes.
In syslog, the facility is used to represent the source that generated the message. This source can be
a process on the local device, an application, or even an operating system. Facilities are represented
by numbers (integers). In Cisco IOS software, there are eight local use facilities that can be used
by processes and applications (as well as the device itself) for sending syslog messages. By default,
Cisco IOS devices use facility local7 to send syslog messages. However, it should be noted that
most Cisco devices provide options to change the default facility level. In Cisco IOS software, the
logging facility [facility] global configuration command can be used to specify the syslog
facility. The options available with this command are as follows:
R1(config)#logging facility ?
auth
Authorization system
cron
Cron/at facility
daemon System daemons
kern
Kernel
local0 Local use
local1 Local use
local2 Local use
local3 Local use
local4 Local use
local5 Local use
26
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
local6
local7
lpr
mail
news
sys10
sys11
sys12
sys13
sys14
sys9
syslog
user
uucp
Local use
Local use
Line printer system
Mail system
USENET news
System use
System use
System use
System use
System use
System use
Syslog itself
User process
Unix-to-Unix copy system
To send messages via syslog, you must perform the following sequence of steps on the device:
1. Globally enable logging on the router or switch using the logging on configuration command. By default, in Cisco IOS software, logging is enabled; however, it is only enabled to
send messages to the console. The logging on command is a mandatory requirement when
sending messages to any destination other than the console.
2. Specify the severity of messages to send to the syslog server using the logging trap [severity] global configuration command. You can specify the severity numerically or using
the equivalent severity name.
3. Specify one or more syslog server destinations using the logging [address] or logging
host [address] global configuration commands.
4. Optionally, specify the source IP address used in syslog messages using the logging sourceinterface [name]. This is a common practice on devices with multiple interfaces config-
ured. If this command is not specified, then the syslog message will contain the IP address
of the router or switch interface used to reach the server. If there are multiple interfaces for
redundancy, this address may change when the primary path (interface) is down. Therefore,
it is typically set to a Loopback interface.
The following configuration example illustrates how to send all informational (level 6) and below
messages to a syslog server with the IP address 192.168.1.254:
R2(config)#logging on
R2(config)#logging trap informational
R2(config)#logging 192.168.1.254
27
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
This configuration can be validated using the show logging command as illustrated below:
R2#show logging
Syslog logging: enabled (11 messages dropped, 1 messages rate-limited,
0 flushes, 0 overruns, xml disabled, filtering disabled)
Console logging: disabled
Monitor logging: level debugging, 0 messages logged, xml disabled,
filtering disabled
Buffer logging: disabled, xml disabled,
filtering disabled
Logging Exception size (4096 bytes)
Count and timestamp logging messages: disabled
No active filter modules.
Trap logging: level informational, 33 message lines logged
Logging to 192.168.1.254(global) (udp port 514, audit disabled,
up), 2 message lines logged, xml disabled,
filtering disabled
link
When configuring logging in general, it is important to ensure that the router or switch clocks
reflect the actual current time, which allows you to correlate the fault data. Inaccurate or incorrect
timestamps on log messages make the fault and problem isolation using a filtration and correlation process very difficult and very time consuming. In Cisco IOS devices, the system clock can be
configured manually or the device can be configured to synchronize its clock with a Network Time
Protocol (NTP) server. These two options are discussed in the following sections.
Manual clock or time configuration is fine if you have only a few internetwork devices in your network. In Cisco IOS software, the system time is configured using the clock set hh:mm:ss [day
& month | month & day] [year] privileged EXEC command. It is not configured or specified
in global configuration mode. The following configuration example illustrates how to set the system
clock to October 20 12:15 am:
R2#clock set 12:15:00 20 october 2010
Alternatively, the same configuration could be implemented on the router as follows:
R2#clock set 12:15:00 october 20 2010
Following this configuration, the show clock command can be used to view the system time:
R2#show clock
12:15:19.419 UTC Wed Oct 20 2010
28
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
One interesting observation of note is that when the system time is configured manually or set using the clock set command, it defaults to the GMT (UTC) time zone, as can be seen above. In
order to ensure that the system clock reflects the correct time zone, for those who are not in the
GMT time zone, you must use the clock timezone [time zone name] [GMT offset] global
configuration command. For example, the United States has six different time zones, each with a
different GMT offset. These time zones are Eastern Time, Central Time, Mountain Time, Pacific
Time, Hawaii Time, and Alaska Time.
In addition, some of the time zones use Standard Time and Daylight Savings Time. Given this, it is
important to ensure that the system time is set correctly (Standard or Daylight) on all devices when
manually configuring the system clock. The following configuration example illustrates how to set
the system clock to 12:40 am on October 20 for the Central Standard Time (CST) time zone, which
is six hours behind GMT:
R2#config t
Enter configuration commands, one per line.
R2(config)#clock timezone CST -6
R2(config)#end
R2#clock set 12:40:00 october 20 2010
End with CNTL/Z.
Following this configuration, the system clock on the local router now shows the following:
R2#show clock
12:40:17.921 CST Wed Oct 20 2010
NOTE: If you use the clock
set command before the clock timezone command, then the
time that you specified using the clock set command will be offset by using the clock timezone command. For example, assume that the configuration commands that are used in the
example above were entered on the router as follows:
R2#clock set 12:40:00 october 20 2010
R2#config t
Enter configuration commands, one per line.
R2(config)#clock timezone CST -6
R2(config)#end
End with CNTL/Z.
Because the clock set command is used first, the output of the show clock command on the
router would show the system clock offset by 6 hours, as specified using the clock timezone command. This behavior is illustrated in the following output on the same router:
R2#show clock
06:40:52.181 CST Wed Oct 20 2010
29
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
NOTE: Cisco IOS routers and switches can be configured to switch automatically to summer
time (Daylight Saving Time) using the clock summer-time zone recurring [week day
month hh:mm week day month hh:mm [offset]] global configuration command. This negates the need to have to adjust the system clock manually on all manually configured devices
during Standard Time and Daylight Saving Time periods.
The second method to setting or synchronizing the system clock is to use a Network Time Protocol
(NTP) server as a reference time source. This is the preferred method in larger networks with more
than just a few internetwork devices. NTP is a protocol that is designed to time-synchronize a network of machines. NTP is documented in RFC 1305 and runs over UDP.
An NTP network usually gets its time from an authoritative time source, such as a radio clock or an
atomic clock attached to a time server. NTP then distributes this time across the network. NTP is
extremely efficient; no more than one packet per minute is necessary to synchronize two machines
to within a millisecond of one another.
NTP uses the concept of a stratum to describe how many NTP hops away a machine is from an
authoritative time source. Keep in mind that this is not routing or switching hops, but NTP hops,
which is a totally different concept. A stratum 1 time server typically has a radio or atomic clock
directly attached, while a stratum 2 time server receives its time via NTP from a stratum 1 time
server, and so on. When a device is configured with multiple NTP reference servers, it will automatically choose as its time source the machine with the lowest stratum number that it is configured to communicate with via NTP.
In Cisco IOS software, a device is configured with the IP addresses of one or more NTP servers
using the ntp server [address] global configuration command. As previously stated, multiple
NTP reference addresses can be specified by repeatedly using the same command. In addition,
this command can also be used to configure security and other features between the server and
the client. These features will be described in detail later in this guide. The following configuration
example illustrates how to configure a device to synchronize its time with an NTP server with the
IP address 10.0.0.1:
R2(config)#ntp server 10.0.0.1
Following this configuration, the show ntp associations command can be used to verify the
communications between the NTP devices as illustrated in the following output:
R2#show ntp associations
address
ref clock
st
when
poll reach
30
delay
offset
disp
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
*~10.0.0.1
127.127.7.1
5
44
64 377
3.2
2.39
1.2
* master (synced), # master (unsynced), + selected, - candidate, ~ configured
The output of this command provides some very telling information, of which only some will be
relevant to the TSHOOT certification exam. First, the address field indicates the IP address of the
NTP server as confirmed by the value 10.0.0.1 specified under this field. The ref clock field indicates the reference clock used by that NTP server. In this case, the IP address 127.127.7.1 indicates
that the device is using an internal clock (127.0.0.0/8 subnet) as its reference time source. If this field
contained another value, such as 192.168.1.254, for example, then that would be the IP address the
server was using as its time reference.
Next, the st field indicates the stratum of the reference. From the output printed above, we can
determine that the 10.0.0.1 NTP device has a stratum of 5. The stratum on the local device will be
incremented by 1 to a value of 6, as shown below, because it receives its time source from a server
with a stratum value of 5. If another device were synchronized to the local router, it would reflect
a stratum of 7 and so forth. The second command that is used to validate the NTP configuration is
the show ntp status command, the output of which is illustrated below:
R2#show ntp status
Clock is synchronized, stratum 6, reference is 10.0.0.1
nominal freq is 249.5901 Hz, actual freq is 249.5900 Hz, precision is 2**18
reference time is C02C38D2.950DA968 (05:53:22.582 UTC Sun Mar 3 2002)
clock offset is 4.6267 msec, root delay is 3.16 msec
root dispersion is 4.88 msec, peer dispersion is 0.23 msec
The output of the show ntp status command indicates that the clock is synchronized to the
configured NTP server (10.0.0.1). This server has a stratum of 5, hence the local device reflects a
stratum of 6. An interesting observation when NTP is configured is that the local time still defaults
to GMT, as can be seen in the bolded section above. To ensure that the device displays the correct
time zone, you must issue the clock time-zone command on the device.
After the system clock has been set, either manually or via NTP, it is important to ensure that
the logs sent to the server contain the correct timestamps. This is performed using the service
timestamps log [datetime | uptime] global configuration command. The datetime keyword
supports the following self-explanatory additional sub-keywords:
R2(config)#service timestamps log datetime ?
localtime
Use local time zone for timestamps
msec
Include milliseconds in timestamp
show-timezone Add time zone information to timestamp
year
Include year in timestamp
<cr>
31
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The uptime keyword has no additional sub-keywords and configures the local router to include
only the system uptime as the timestamp for sent messages. The following configuration example
illustrates how to configure the local router to include the local time, millisecond information, and
the time zone for all messages:
R2#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#logging on
R2(config)#logging console informational
R2(config)#logging host 150.1.1.254
R2(config)#logging trap informational
R2(config)#service timestamps log datetime localtime msec show-timezone
Following this configuration, the local router console would print the following message:
Oct 20 02:14:10.519 CST: %SYS-5-CONFIG_I: Configured from console by console
Oct 20 02:14:11.521 CST: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host
150.1.1.254 started - CLI initiated
In addition, the syslog daemon on server 150.1.1.254 would also reflect the same as illustrated below in the Kiwi Syslog Manager screenshot in Figure 1-1 below:
Fig. 1-1. Configuring Log Timestamps
Simple Network Management Protocol
The Simple Network Management Protocol (SNMP) is a widely used management protocol and
defined set of standards for communications with devices connected to an IP network. SNMP provides a means to monitor and control network devices. Like Cisco IOS IP SLA operations, SNMP
can be used to collect statistics, to monitor device performance, and to provide a baseline of the
network, and is one of the most commonly used network maintenance and monitoring tools.
SNMP is an Application Layer (Layer 7) protocol, using UDP ports 161 and 162, that facilitates
the exchange of management information between network devices. An SNMP-managed network
consists of a management system, agents, and managed devices. The management system executes
monitoring applications and controls managed devices. It also executes most of the management
processes and provides the bulk of memory resources used for network management. A network
32
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
might be managed by one or more management systems. Examples of SNMP management systems
include HP OpenView and SolarWinds.
An SNMP agent resides on each managed device and translates local management information data,
such as performance information or event and error information caught in software traps, into a
readable form for the management system. SNMP agents use get-requests that transport data to the
network management software. SNMP agents capture data from Management Information Bases
(MIBs), which are device parameter and network data repositories, or from error or change traps.
A managed element, such as a router, a switch, a computer, or a firewall, is accessed via the SNMP
agent. Managed devices collect and store management information, making it available through
SNMP to other management systems having the same protocol compatibility. Figure 1-2 below illustrates the interaction of the three primary components of an SNMP-managed network:
Fig. 1-2. SNMP Network Component Interaction
Referencing Figure 1-2, R1 is the SNMP-managed device. Logically residing on the device is the
SNMP agent. The SNMP agent translates local management information data, stored in the management database of the managed device, into a readable form for the management system, which
is also referred to as the Network Management Station (NMS).
When using SNMP, managed devices are monitored and controlled using three common SNMP
commands: read, write, and trap. The read command is used by an NMS to monitor managed devices. This is performed by the NMS examining different variables that are maintained by managed
devices. The write command is used by an NMS to control managed devices. Using this command,
the NMS can change the values of variables stored within managed devices. Finally, the SNMP trap
command is used by managed devices to report events to the NMS. Devices can be configured to
send SNMP traps or informs to an NMS. The traps and informs that are sent are dependent on the
version of Cisco IOS software running on the device, as well as the platform.
SNMP traps are simply messages that alert the SNMP manager of a condition on the network. An
example of an SNMP trap could include an interface transitioning from an up state to a down state.
33
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The primary issue with SNMP traps is that they are unacknowledged. This means that the sending
device is incapable of determining whether the trap was received by the NMS.
SNMP informs are SNMP traps that include a confirmation of receipt from the SNMP manager.
These messages can be used to indicate failed authentication attempts, or the loss of a connection
to a neighbor router, for example. If the manager does not receive an inform request, then it does
not send a response. If the sender never receives a response, then the inform request can be sent
again. Thus, informs are more likely to reach their intended destination.
While informs are more reliable than traps, the downside is that they consume more resources on
both the router and in the network. Unlike a trap, which is discarded as soon as it is sent, an inform
request must be held in memory until a response is received or the request times out. In addition,
traps are sent only once, while an inform may be resent several times if a response is not received
from the SNMP server (NMS).
Figure 1-3 below illustrates the communication between the SNMP manager and the SNMP agent
for sending traps and informs:
Fig. 1-3. UDP Ports Used by the NMS and Managed Element
The three versions of SNMP are versions 1, 2, and 3. Version 1, or SNMPv1, is the initial implementation of the SNMP protocol. SNMPv1 operates over protocols such as User Datagram Protocol
(UDP), Internet Protocol (IP), and the OSI Connectionless Network Service (CLNS). SNMPv1 is
widely used and is the de facto network-management protocol used within the Internet community.
SNMPv2 revises SNMPv1 and includes improvements in the areas of performance, security, confidentiality, and manager-to-manager communications. SNMPv2 also defines two new operations:
GetBulk and Inform. The GetBulk operation is used to retrieve large blocks of data efficiently. The
Inform operation allows one NMS to send trap information to another NMS and then to receive a
response. In SNMPv2, if the agent responding to GetBulk operations cannot provide values for all
the variables in a list, then it provides partial results.
34
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
SNMPv3 provides the following three additional security services that are not available in previous
versions of SNMP: message integrity, authentication, and encryption. SNMPv3 uses message integrity to ensure that a packet has not been tampered with in-transit. SNMPv3 also utilizes authentication, which is used to determine whether the message is from a valid source. Finally, SNMPv3
provides encryption, which is used to scramble the contents of a packet to prevent it from being
seen by unauthorized sources.
NOTE: You are not required to go into detail on SNMP versions in the TSHOOT exam. Instead, emphasis should be placed simply on having a basic understanding of the protocol and
how it is used as a monitoring and maintenance tool. Additional theoretical and configuration
information on SNMP can be found in the current SWITCH guide or online at www.howtonetwork.net.
In Cisco IOS software, the snmp-server host [hostname | address] command is used to
specify the hostname or IP address of the NMS to which the local device will send traps or informs.
To allow the NMS to poll the local device, SNMPv1 and SNMPv2c require that a community string
be specified for either read-only or read-write access using the snmp-server community <name>
[ro | rw] global configuration command.
SNMPv3 does not use the same community-based form of security but instead uses user and group
security. The following configuration example illustrates how to configure the local device with two
community strings, one for read-only access and the other for read-write access. In addition, the
local device is also configured to send SNMP traps for Cisco IOS IP SLA operations and syslog to
1.1.1.1 using the read-only community string:
R2#config t
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#snmp-server community unsafe RO
R2(config)#snmp-server community safe RW
R2(config)#snmp-server host 1.1.1.1 traps readonlypassword rtr syslog
Figure 1-4 below illustrates a sample report for device resource utilization and availability based on
SNMP polling using the ManageEngine OpManager network monitoring software:
35
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Fig. 1-4. Sample SNMP Report on Device Resource Utilization
Cisco IOS NetFlow
Like SNMP, Cisco IOS NetFlow is a powerful maintenance and monitoring tool that can be used to
baseline network performance and assist in troubleshooting. However, there are some significant
differences between Cisco IOS NetFlow and SNMP. The first difference is that while SNMP reports
primarily on device statistics (e.g., resource utilization, etc.), Cisco IOS NetFlow reports on traffic
statistics (e.g., packets and bytes).
The second difference between these two tools is that SNMP is a poll-based protocol, meaning
that the managed device is polled for information. Cisco IOS NetFlow, however, is a push-based
technology, meaning that the device on which NetFlow is configured sends out information that it
has collected locally to a central repository. For this reason, NetFlow and SNMP complement each
other and should be used together as part of the standard network maintenance and monitoring
toolkit. However, they are not replacements for each other; this is often a misunderstood concept
and it is important to ensure that you remember this.
Another difference is that while SNMP can provide traffic statistics, SNMP cannot differentiate between individual flows. However, Cisco IOS NetFlow can. A flow is simply a series of packets with
the same source and destination IP address, source and destination ports, protocol interface, and
Class of Service parameters. For IP applications, an IP flow is based on a set of five, and up to seven,
IP packet attributes, which may include the following:
1. Destination IP address
2. Source IP address
36
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
3. Source port
4. Destination port
5. Layer 3 protocol type
6. Class of Service
7. Router or switch interface
In addition to these IP attributes, other additional information is also included with a flow. This
additional information includes timestamps, which are useful for calculating packets and bytes per
second. Timestamps also provide information on the life (duration) of a flow. The flow also includes
next-hop IP address information, including BGP routing Autonomous Systems information. Subnet mask information for the flow source and destination addresses is also included, in addition to
flags for TCP traffic, which can be used to examine the TCP handshakes.
This means that Cisco IOS NetFlow can be used for network traffic accounting, usage-based network billing, network planning, security, Denial of Service (DoS) monitoring capabilities, and network monitoring, in addition to providing information about network users and applications, peak
usage times, and traffic routing. All of this makes it a very powerful maintenance, monitoring, and
troubleshooting tool.
Cisco IOS NetFlow gathers the flow information and stores it in a database called the NetFlow
cache or simply the flow cache. Flow information is retained until the flow is terminated or stopped,
times out, or the cache is filled. Two methods can be used to access the data stored in the flow: using the CLI (i.e., using show commands) or exporting the data and then viewing it using some type
of reporting tool. Figure 1-5 below illustrates NetFlow operation on a Cisco IOS router and how
the flow cache is populated:
NetFlow-Enabled Router
Egress Traffic
Ingress Traffic
NetFlow Cache
•
•
•
•
•
Inspect IP Packet Attributes
•
•
•
•
•
•
•
IP source address
IP destination address
Source port
Destination port
Layer 3 protocol type
Class of Service
Router or switch
interface
Byte info.
Address info.
Port info.
Packet info.
ETC., ETC.
Flow Cache
Fig. 1-5. Basic NetFlow Operation and Flow Cache Population
37
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Referencing Figure 1-5, ingress traffic is received on the local router. This traffic is inspected by the
router and IP attribute information is used to create a flow. The flow information is then stored in
the flow cache. This information can be viewed using the CLI or can also be exported to an external
destination, referred to as a NetFlow Collector, where the same information can then be viewed using an application reporting tool. The following steps are used to implement NetFlow data reporting
to the NetFlow Collector:
1. Cisco IOS NetFlow is configured on the device to capture flows to the NetFlow cache.
2. NetFlow export is configured to send flows to the Collector.
3. The NetFlow cache is searched for flows that have been inactive for a certain period of time,
have been terminated, or, for active flows, that last greater than the active timer.
4. Those identified flows are exported to the NetFlow Collector server.
5. Approximately 30 to 50 flows are bundled together and are typically transported via UDP.
6. The NetFlow Collector software creates real-time or historical reports from the data.
Three primary steps are required when configuring Cisco IOS NetFlow as follows:
1. Configure the interface to capture flows into the NetFlow cache using the ip flow ingress
interface configuration command on all interfaces for which you want information to be captured and stored in the flow cache. It is important to remember that NetFlow is configured
on a per-interface basis only.
The NetFlow information is then stored on the local router and can be viewed using the show ip
cache flow command on the local device.
In the event that you want to export data to the NetFlow Collector, two additional tasks will be
required as follows:
2. Configure the Cisco IOS NetFlow version or format to use via the ip flow-export version [1 | 5 | 9] global configuration command. NetFlow version 1 (v1) is the original
format supported in the initial NetFlow releases. This version should be used only when it
is the only NetFlow data export format version that is supported by the application that you
are using to analyze the exported NetFlow data. Version 5 exports more fields than version
1 does and is the most widely deployed version. Version 9 is the latest Cisco IOS NetFlow
version and is the basis of a new IETF standard. Version 9 is a flexible export format version.
3. Configure and specify the IP address of the NetFlow Collector, and then specify the UDP
port that the NetFlow Collector will use to receive the UDP export from the Cisco device,
38
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
using the ip flow-export destination [hostname | address] <port> [udp] global
configuration command. The [udp] keyword is optional and does not need to be specified
when using this command because User Datagram Protocol (UDP) is the default transport
protocol used when sending data to the NetFlow Collector.
The following example illustrates how to enable NetFlow for a specified router interface:
R1#config t
Enter configuration commands, one per line.
R1(config)#interface Serial0/0
R1(config-if)#ip flow ingress
R1(config-if)#end
End with CNTL/Z.
Following this configuration, the show ip cache flow command can be used to view collected
statistics in the flow cache as illustrated in the output below:
R1#show ip cache flow
IP packet size distribution (721 total packets):
1-32
64
96 128 160 192 224 256 288 320 352 384 416 448 480
.000 .980 .016 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
512 544 576 1024 1536 2048 2560 3072 3584 4096 4608
.002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
IP Flow Switching Cache, 278544 bytes
4 active, 4092 inactive, 56 added
1195 ager polls, 0 flow alloc failures
Active flows timeout in 30 minutes
Inactive flows timeout in 15 seconds
IP Sub Flow Cache, 21640 bytes
4 active, 1020 inactive, 56 added, 56 added to flow
0 alloc failures, 0 force free
1 chunk, 1 chunk added
last clearing of statistics never
Protocol
Total
Flows
Packets Bytes Packets Active(Sec) Idle(Sec)
-------Flows
/Sec
/Flow /Pkt
/Sec
/Flow
/Flow
TCP-Telnet
2
0.0
34
40
0.0
10.5
15.7
TCP-WWW
2
0.0
9
93
0.0
0.1
1.5
UDP-NTP
1
0.0
1
76
0.0
0.0
15.4
UDP-other
42
0.0
5
59
0.0
0.0
15.7
ICMP
5
0.0
10
64
0.0
0.0
15.1
Total:
52
0.0
7
58
0.0
0.4
15.1
SrcIf
Se0/0
Se0/0
Se0/0
Se0/0
SrcIPaddress
150.1.1.254
10.0.0.2
10.0.0.2
10.0.0.2
DstIf
Local
Local
Local
Local
DstIPaddress
10.0.0.1
1.1.1.1
10.0.0.1
10.0.0.1
39
Pr
01
06
11
11
SrcP
0000
C0B3
07AF
8000
DstP
0800
0017
D0F1
D0F1
Pkts
339
7
1
10
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Se0/0
Se0/0
150.1.1.254
10.0.0.2
Local
Local
10.0.0.1
1.1.1.1
01 0000 0800
06 C0B3 0017
271
59
The following example illustrates how to configure and enable NetFlow data collection for the
specified router interfaces and then export the data to a NetFlow Collector with the IP address
150.1.1.254 over UDP port 5000 using NetFlow version 5 or the version 5 data format:
R1(config)#interface Serial0/0
R1(config-if)#ip flow ingress
R1(config-if)#exit
R1(config)#interface FastEthernet0/0
R1(config-if)#ip flow ingress
R1(config-if)#exit
R1(config)#interface Serial0/1
R1(config-if)#exit
R1(config)#ip flow-export version 5
R1(config)#ip flow-export destination 150.1.1.254 5000
R1(config)#exit
Following this configuration, the collected information can then be viewed using an application
reporting tool on the NetFlow Collector. Despite the export of the data, the show ip cache flow
command can still be used to view statistics on the local device, which can be a useful tool when
troubleshooting network issues or problem reports.
Network-Based Applica on Recogni on
Network-Based Application Recognition (NBAR) is another Cisco IOS software tool that can be
used for monitoring and baselining network performance. NBAR is an intelligent classification
engine in Cisco IOS software that can recognize a wide variety of applications. Once the applications are recognized, the network can invoke required services for that particular application by
implementing QoS policies to support the application requirements. NBAR provides two primary
functions: identifying applications and protocols, and allowing for the dynamic discovery of protocols on the network.
Network-Based Application Recognition can classify applications that use statically assigned TCP
and UDP port numbers, as well as those that use dynamically assigned or negotiated TCP and
UDP port numbers. NBAR can also recognize and classify applications based on non-UDP and
non-TCP IP protocols. In addition, NBAR can also perform sub-port classification, including the
classification of HTTP URLs, mime, or hostnames. NBAR also supports and can be used for Citrix
traffic classification, as well as for Real-Time Transport Protocol (RTP), which is used by IP voice
and video.
40
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
The NBAR Protocol Discovery (PD) feature can also be used to collect application and protocol statistics, such as packet counts, byte counts, and bit rates, on a per-interface basis. This information
can then be retrieved by polling SNMP statistics from the NBAR PD Management Information
Base (MIB). NBAR uses Packet Description Language Modules (PDLMs) for protocol and application recognition. In the event that a specific protocol or application is not recognized, an external
PDLM can be loaded at any time into the router Flash memory to extend the NBAR list of recognized protocols. PDLMs can also be used to enhance an existing protocol recognition capability.
The use of PDLMs allows NBAR to recognize additional protocols and applications without having
to upgrade or replace the current version of software on the router, providing additional flexibility
for network administrators.
Like NetFlow, NBAR Protocol Discovery is enabled on a per-interface basis using the ip nbar
protocol-discovery interface configuration command. Prior to configuring NBAR, you must
enable Cisco Express Forwarding (CEF) on the router using the ip cef global configuration command. CEF is described in detail in both the ROUTE and SWITCH guides that are available online.
CEF troubleshooting will be described later in this guide. The following configuration example illustrates how to enable NBAR on a router interface. Note that CEF is enabled prior to the NBAR
PD configuration:
R2(config)#ip cef
R2(config)#interface FastEthernet0/0
R2(config-if)#ip nbar protocol-discovery
R2(config-if)#exit
Following this configuration, NBAR PD statistics can be viewed by issuing the show ip nbar protocol-discovery command. Keep in mind that the recognized applications or protocols printed
in the output of this command will vary, depending on your current IOS version and the PDLMs
that have been integrated into that version of code or have been loaded into router Flash memory:
R2#show ip nbar protocol-discovery
FastEthernet0/0
Input
----Protocol
Packet Count
Byte Count
5min Bit Rate (bps)
5min Max Bit Rate (bps)
------------------------ -----------------------netbios
832
76544
3000
3000
41
Output
-----Packet Count
Byte Count
5min Bit Rate (bps)
5min Max Bit Rate (bps)
-----------------------0
0
0
0
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
snmp
icmp
eigrp
ospf
syslog
24
3172
0
1000
222
16428
0
0
0
0
0
0
0
0
0
0
0
0
0
0
12
1655
0
0
221
16342
0
0
10
740
0
0
3
270
0
0
2
254
0
0
...
[Truncated Output]
When using NBAR, it is important to remember that NBAR PD is based on the standard port numbers for the different applications. For example, NBAR will recognize an application as HTTP if the
application is using TCP port 80. Likewise, NBAR will recognize SMTP based on the standard TCP
port number of 25. This presents a potential problem in the event that protocols or applications are
not using well-known port numbers. For example, it is common practice that some Web applications use TCP port 8080 in addition to the standard port 80.
In such a case, you can use the ip nbar port-map [application or protocol] [tcp | udp]
[port number 1] [port number 2]…[port number 16] global configuration command to
specify up to 16 additional port numbers used by the protocol. The following configuration example illustrates how to configure NBAR to search for a protocol or protocol name using a port
number other than the well-known port as follows:
•
For E-mail (SMTP) traffic, NBAR should search for TCP ports 25 and 2525
•
For Telnet traffic, NBAR should search for TCP ports 23 and 3023
•
For Web (HTTP) traffic, NBAR should search for TCP ports 80 and 8080
This configuration would be implemented as follows on the local router:
42
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
R2#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#ip cef
R2(config)#ip nbar port-map smtp tcp 25 2525
R2(config)#ip nbar port-map telnet tcp 23 3023
R2(config)#ip nbar port-map http tcp 80 8080
R2(config)#interface FastEthernet0/0
R2(config-if)#description ‘Connected To Corporate LAN’
R2(config-if)#ip nbar protocol-discovery
R2(config-if)#exit
Following this configuration, you can then use the show ip nbar port-map [protocol] command to see the ports that NBAR is recognizing for either specific applications or protocols or for
all applications and protocols. For example, to see the ports that NBAR recognizes for the Telnet
application, you would issue the following command on the router:
R2#show ip nbar port-map telnet
port-map telnet
tcp 23 3023
If the [protocol] is not specified at the end of the command, NBAR shows default as well as customized port information for all supported protocols and applications as follows:
R2#show ip nbar port-map
port-map bgp
port-map bgp
port-map citrix
port-map citrix
port-map cuseeme
port-map cuseeme
port-map dhcp
port-map dns
port-map dns
udp
tcp
udp
tcp
udp
tcp
udp
udp
tcp
179
179
1604
1494
7648 7649 24032
7648 7649
67 68
53
53
...
[Truncated Output]
As previously stated, NBAR uses PDLMs for protocol and application recognition. In the event that
the PDLM file on the router does not recognize a specific protocol or application, you can download a PDLM file for the application (if one exists) from the Cisco Web site. Once the download is
complete, copy the file to the router Flash memory and then use the ip nbar pdlm [flash://
filename.pdlm] global configuration command to reference the file. For example, to load a PDLM
for Bit Torrent application recognition, you would download the PDLM from the Cisco Web site
and then implement the following configuration on the router:
43
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R2#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#ip cef
R2(config)#ip nbar pdlm flash://bittorrent.pdlm
R2(config)#interface FastEthernet0/0
R2(config-if)#description ‘Connected To Corporate LAN’
R2(config-if)#ip nbar protocol-discovery
R2(config-if)#exit
As previously stated, PDLMs are available for most well-known applications and protocols, but this
does not include proprietary applications. If, for example, you have a custom application that uses
TCP port numbers 1111, 2222, and 3333, then you can use the ip nbar custom [custom name]
[tcp | udp] [port number 1]…[port number x] global configuration command to configure
NBAR to classify and monitor the additional static port application as illustrated in the following
configuration example:
R2(config)#ip cef
R2(config)#ip nbar custom h2n_custom_app tcp 1111 2222 3333
R2(config)#interface FastEthernet0/0
R2(config-if)#description ‘Connected To Corporate LAN’
R2(config-if)#ip nbar protocol-discovery
R2(config-if)#exit
Following your configuration, you can then use the show ip nbar port-map [protocol] command to see the ports that NBAR is recognizing for the specific application as follows:
R2#show ip nbar port-map h2n_custom_app
port-map h2n_custom_app
tcp 1111 2222 3333
Additionally, you can also use the show ip nbar protocol discovery command to view statistics
for the custom application as shown in the following output, which has been filtered to include only
the previously configured custom application statistics:
R2#show ip nbar protocol-discovery | section h2n_custom_app
h2n_custom_app
21
21
1358
1134
0
0
0
0
Given the options available with and the flexibility afforded by NBAR, it is easy to see why this is
a popular network monitoring tool. However, one significant drawback to NBAR is that it is very
resource intensive and consumes a great deal of CPU and memory resources. It is important to
ensure that you do not implement NBAR on a router whose resources are already taxed. After
NBAR has been enabled, you can use the show ip nbar resources command to view how much
memory it is consuming as illustrated in the following output:
44
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
R2#show ip nbar resources
NBAR memory usage for tracking Stateful sessions
Max-age
: 120 secs
Initial memory
: 1383 KBytes
Max initial memory
: 4611 KBytes
Memory expansion
: 68 KBytes
Max memory expansion : 68 KBytes
Memory in use
: 1383 KBytes
Max memory allowed
: 9223 KBytes
Active links
: 0
Total links
: 20346
You can monitor the CPU utilization on the router using the show processes cpu command. This
is a core troubleshooting command that will be described in detail later in this guide.
Configura on Management
In this next section of Cisco IOS network maintenance and monitoring tools, we will explore some
of the tools available in Cisco IOS software for configuration management. As we learned earlier in
this chapter, configuration management, including backing up configurations, scheduling backups,
and restoring from backups, is a core maintenance task. This section describes the toolkit available
in Cisco IOS software to assist with these tasks.
One of the most commonly used configuration management commands in Cisco IOS software is
the copy running-config command. This command can be used to save the device configuration
locally or to a remote destination, such as a Trivial File Transfer Protocol (TFTP) or a File Transfer
Protocol (FTP) server. Supported options are illustrated below:
R2#copy running-config ?
archive:
Copy to archive: file system
flash:
Copy to flash: file system
ftp:
Copy to ftp: file system
http:
Copy to http: file system
https:
Copy to https: file system
ips-sdf
Update (merge with) IPS signature configuration
null:
Copy to null: file system
nvram:
Copy to nvram: file system
pram:
Copy to pram: file system
rcp:
Copy to rcp: file system
running-config Update (merge with) current system configuration
scp:
Copy to scp: file system
startup-config Copy to startup configuration
syslog:
Copy to syslog: file system
system:
Copy to system: file system
tftp:
Copy to tftp: file system
NOTE: The options displayed above will vary depending on the IOS version on the device.
45
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The following configuration example illustrates how to copy the router running configuration to a
TFTP server with the IP address 150.1.1.254:
R2#copy running-config tftp:
Address or name of remote host []? 150.1.1.254
Destination filename [r2-confg]?
!!
2732 bytes copied in 2.296 secs (1190 bytes/sec)
The same action can also be performed on a single line as follows:
R2#copy running-config tftp://150.1.1.254
Address or name of remote host [150.1.1.254]?
Destination filename [r2-confg]?
!!
2732 bytes copied in 2.288 secs (1194 bytes/sec)
Unlike TFTP, file transfers to FTP servers typically require a username and a password, which in
turn allows for greater security than that which is provided by TFTP. When copying configuration
files to an FTP server requiring a username and password pair for login, you have two options for
specifying the username and password pair that the local device will use. The first option is to configure globally the FTP username and password on the device using the ip ftp username <name>
and ip ftp password <secret> global configuration commands.
Following this, you can then use the copy running-config ftp: command. The example below
illustrates how to configure a global FTP username and password pair and how to copy the configuration of the local router to an FTP server with the IP address 150.1.1.254. This example assumes
that the FTP server has been appropriately configured:
R2(config)#ip ftp username netadmin
R2(config)#ip ftp password tshoot
R2(config)#end
R2#copy running-config ftp:
Address or name of remote host []? 150.1.1.254
Destination filename [r2-confg]?
Writing r2-confg !
2780 bytes copied in 4.932 secs (564 bytes/sec)
NOTE: Referencing the FTP configuration above, it is important to keep in mind that the FTP
password will be stored in plain text on the device until the service password-encryption
global configuration command is issued. Following that, the password will be displayed in a
hashed format.
If the FTP username and password pair is not configured globally on the router, then you can still
specify these parameters when using the copy command as follows:
46
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
R2#copy running-config ftp://netadmin:tshoot@150.1.1.254
Address or name of remote host [150.1.1.254]?
Destination filename [r2-confg]?
Writing r2-confg !
2738 bytes copied in 7.180 secs (381 bytes/sec)
In addition to basic copy commands, Cisco IOS software also supports configuration archive, configuration replace, and configuration rollback tools for configuration management functionality.
The configuration archive provides a mechanism to store, organize, and manage an archive of configuration files. This functionality is intended to enhance the configuration rollback capability that
is also supported in Cisco IOS software.
The configuration archive feature allows you to save configurations in the configuration archive
using a standard location and filename prefix that is automatically appended with an incremental
version number, and optional timestamp, as each consecutive file is saved. This functionality provides a means for consistent identification of saved configuration files. You can specify how many
versions of the running configuration are kept in the archive. After the maximum number of files
specified has been reached in the archive, the oldest file will then be deleted automatically when the
next, most recent file is saved. The Cisco IOS configuration archive, in which the configuration files
are stored, can be located on the following file systems:
•
If your platform has disk0—disk0:, disk1:, ftp:, pram:, rcp:, slavedisk0:, slavedisk1:, tftp:
•
If your platform does not have disk0—ftp:, http:, pram:, rcp:, tftp:
Implementing the configuration archive feature is a four-step process performed as follows:
1. After entering global configuration mode, issue the archive command to enter archive configuration mode.
2. When in archive configuration mode, next specify the location and filename prefix for the
files in the Cisco IOS configuration archive using the path <url> archive configuration
mode command. The <url> argument is one of the valid locations specified in the previous section (e.g., tftp:, ftp: disk0:, etc.). The available options will depend on the platform on
which this command is implemented.
3. Optionally, specify the maximum number of files to save using the maximum <number> archive configuration command. By default, 10 files will be saved; however, up to 14 files can
be saved in the archive. When the specified maximum value has been reached, the oldest file
will be overwritten and replaced by the most recent file. An important point to remember is
that this command cannot be used or is not supported when backing up the configuration to
a network location such as a TFTP or FTP server.
47
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
4. Finally, optionally, specify the time increment for automatically saving an archive file of the
current running configuration in the configuration archive using the time-period <minutes> archive configuration command. This command has no default.
When configuring the archive feature, the write-memory archive configuration command is typically included in the configuration to allow the router to save the configuration automatically to
the specified location each time the running configuration is saved to NVRAM (i.e., the startup
configuration, which typically indicates some type of configuration change).
The following configuration example illustrates how to configure the local router to back up the
configuration to an FTP server, using the specified FTP username and password pair, every week,
which is 168 hours or 10,080 minutes. The running configuration file will be saved to the server using the name ‘R2-Archive-Config.’ In addition to the weekly scheduled backup, the router is
also configured to archive the configuration every time the running configuration file is saved to
NVRAM (i.e., the startup configuration) as illustrated in the following output:
R2(config)#ip ftp username netadmin
R2(config)#ip ftp password tshoot
R2(config)#archive
R2(config-archive)#path ftp://150.1.1.254/R2-Archive-Config
R2(config-archive)#write-memory
R2(config-archive)#time-period 10080
R2(config-archive)#exit
Following this configuration, you can use the show archive command to view the archived configuration files. Following is a sample output printed by this command:
R2#show archive
The next archive file will be named ftp://150.1.1.254/R2-Archive-Config-6
Archive # Name
0
1
ftp://150.1.1.254/R2-Archive-Config-1
2
ftp://150.1.1.254/R2-Archive-Config-2
3
ftp://150.1.1.254/R2-Archive-Config-3
4
ftp://150.1.1.254/R2-Archive-Config-4
5
ftp://150.1.1.254/R2-Archive-Config-5 <- Most Recent
6
7
8
9
10
11
12
13
14
48
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
Because the write-memory archive configuration command has been included in the archive configuration, the local router will save the configuration to the FTP server if either the write memory
or the copy running-config startup-config commands are issued:
R2#copy running-config startup-config
Destination filename [startup-config]?
Building configuration...
[OK]
Writing R2-Archive-Config-1 !
R2#write memory
Building configuration...
[OK]
Writing R2-Archive-Config-2 !
R2#
The configuration replace and configuration rollback operations allow you to restore previously archived configurations using the configure replace <target-url> [nolock] [list] [force]
[ignorecase] [revert trigger [error | timer <minutes>] | time <minutes>] privi-
leged EXEC command. The <target-url> is used to specify the location of the saved configuration file that is to replace the current running configuration.
The optional [nolock] keyword is used to disable the locking of the running configuration file.
This is used to prevent other users from changing the running configuration during a configuration replace operation. The optional [list] keyword is used to display a list of the command lines
applied by the Cisco IOS software parser during each pass of the configuration replace operation.
When this keyword is used, the total number of passes performed is also displayed. The [force]
keyword is yet another optional keyword that can be used to replace the current running configuration file with the specified saved Cisco IOS configuration file without prompting for confirmation.
The [ignorecase] keyword is an optional keyword that is used to instruct the configuration
to ignore the case of the configuration confirmation. The [revert trigger [error | timer
<minutes>] keywords set the triggers for reverting to the original configuration. If the [error]
keyword is included, then the router will revert back to the original configuration if an error is
detected. If the timer <minutes>] keywords are included, then the router will revert back to the
original configuration file if the specified time period elapses.
Finally, the optional time <minutes> keyword can be used to specify the time in which the configure confirm command must be issued to confirm the replacement of the current running
configuration file. If the configure confirm command is not issued within the specified time
limit, then the configuration replace operation is automatically reversed by the router.
49
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The following example illustrates how to replace the existing running configuration with the archived configuration file named ‘R2-Archive-Config-5’ stored on FTP server 150.1.1.254:
R2#configure replace ftp://150.1.1.254/R2-Archive-Config-5
This will apply all necessary additions and deletions
to replace the current running configuration with the
contents of the specified configuration file, which is
assumed to be a complete configuration, not a partial
configuration. Enter Y if you are sure you want to proceed. ? [no]: y
Loading R2-Archive-Config-5 !
[OK - 2959/4096 bytes]
Total number of passes: 0
Rollback Done
The following example illustrates how to replace the existing running configuration with the archived configuration file R2-Archive-Config-5 stored on FTP server 150.1.1.254 and specify that
the change should be confirmed in 10 minutes, and, if not, the router should reverse this operation
automatically:
R2#configure replace ftp://150.1.1.254/R2-Archive-Config-5 time 10
Writing R2-Archive-Config-6 !Timed Rollback: Backing up to ftp://150.1.1.254/R2Archive-Config-6
This will apply all necessary additions and deletions
to replace the current running configuration with the
contents of the specified configuration file, which is
assumed to be a complete configuration, not a partial
configuration. Enter Y if you are sure you want to proceed. ? [no]: y
Loading R2-Archive-Config-5 !
[OK - 2959/4096 bytes]
Total number of passes: 0
Rollback Done
R2#configure confirm
NOTE: Referencing the output above, if the configure confirm command is not issued,
then the changes will be reversed in 10 minutes. This option is applicable only when a time for
the change confirmation has been specified when using the configure replace command.
Because a time limit was not imposed in the first example, this command need not be issued.
50
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
Cisco IOS Command Scheduler
The final Cisco IOS maintenance tool that we will discuss in this section is the Cisco IOS Command Scheduler (KRON). The Command Scheduler allows you to run exec commands on a regular
basis on a router. For simplicity, consider it as an automation tool for running exec commands on
a router at specified or configured intervals. KRON has two processes, which are policy lists and
the scheduler.
Policy lists contain the exec commands that you want executed on the router. When configuring policy lists, it is important to remember that KRON does not support interactive commands.
Therefore, if you want to create a policy list that saves the device configuration, then you should use
the write memory command instead of the copy running-config startup-config command,
which requires confirmation of this action. This is one of the main limitations of the KRON feature
and one of the reasons it is not implemented as much as the other features.
Cisco IOS Command Scheduler policy lists are configured using the kron policy-list <name>
global configuration command. Following this, in policy list configuration mode, the cli <exec
command> KRON policy list configuration command is used to specify the exec command that the
configured policy list will run. This command can be used to specify multiple commands that will
run at the same time or during the same interval.
Following the configuration of the policy list, the next step or task is to configure the KRON occurrences using the kron occurrence <occurrence-name> [user <name>] [in [[days:]
hours:]min | at hours:min [[month] day-of-month] [day-of-week] [oneshot | recurring]] global configuration command. Next, within KRON configuration mode, specify the
policy list that this schedule applies to using the policy-list <name> Command Scheduler configuration mode command.
NOTE: You are not expected to implement any Command Scheduler (KRON) configuration
in the current TSHOOT certification exam. However, ensure that you are familiar with basic
KRON configuration and functionality.
The following configuration example illustrates how to configure a KRON policy that will be used
to save the router configuration automatically every day (1440 minutes):
R2#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#kron policy-list SaveRouterConfiguration
R2(config-kron-policy)#cli write memory
R2(config-kron-policy)#exit
R2(config)#kron occurrence SaveRouterConfigurationSchedule in 1440 recurring
51
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R2(config-kron-occurrence)#policy-list SaveRouterConfiguration
R2(config-kron-occurrence)#exit
Following the Configuration Scheduler configuration on the router, you can then use the show
kron schedule command to display information about the status and schedule of all configured
KRON occurrences as illustrated in the following router output:
R2#show kron schedule
Kron Occurrence Schedule
SaveRouterConfigurationSchedule inactive, will run again in 0 days 23:58:23
While the example used in the previous output is a simple one, KRON can be used for other tasks,
such as saving device configurations to remote servers (e.g., TFTP servers), making it yet another
useful maintenance tool that is available and at your disposal in Cisco IOS software.
ADDITIONAL MAINTENANCE AND MONITORING TOOLS
In the final section of this chapter, we will discuss some additional maintenance and monitoring
tools that you should be familiar with as a network engineer, which include the following:
•
Cisco Router and Security Device Manager
•
Cisco Configuration Professional
•
Cisco Configuration Assistant
•
Cisco Network Assistant
•
CiscoWorks LAN Management Solution (LMS)
Cisco Router and Security Device Manager
Cisco Router and Security Device Manager (SDM) is a Web-based (GUI) device-management tool
for Cisco routers that can be used for monitoring and troubleshooting tasks. SDM is supported on
a plethora of Cisco IOS routers, ranging from the 800 series routers to the 7300 series routers. SDM
can be installed either on the local computer, the router, or on both the computer and the router.
Because SDM is a core CCNA topic, we will not be going into any additional detail on this tool in
this guide. For additional information on SDM, please refer to the CCNA certification guide that is
available online.
Once you access the router via SDM, you can then use the Monitor tab to view device statistics,
which include, but are not limited to, interface status, logging information, and traffic statistics.
Figure 1-6 below illustrates the Monitor tab that is used for monitoring and troubleshooting:
52
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
Fig. 1-6. Cisco Router and Security Device Manager Monitor Tab
Cisco Configura on Professional
Cisco Configuration Professional (CCP), like Cisco Router and Security Device Manager (SDM),
is also a GUI-based device management tool for Cisco access routers. In fact, CCP is the successor
of Cisco SDM and is intended to replace the SDM tool. CCP includes the same features available
in SDM, and will also include additional capabilities such as Voice over IP (VoIP) support in later
versions.
CCP, like SDM, can be used for network monitoring and maintenance tasks. While delving into the
details pertaining to CCP is beyond the scope of the TSHOOT certification exam, Figure 1-7 below
illustrates the CCP Monitoring tab that can be used for device monitoring and troubleshooting. As
can be seen in Figure 1-7 below, this tab very closely resembles the Monitor tab shown for SDM:
53
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Fig. 1-7. Cisco Configuration Professional Monitor Tab
Cisco Configura on Assistant
The Cisco Configuration Assistant (CCA) tool is yet another maintenance, monitoring, and troubleshooting tool that is available from Cisco. CCA is used for the Cisco Smart Business Communications System. CCA is a GUI-based tool that includes multiple wizards, similar to SDM and CCP,
which can be used for IP Telephony configuration, router and switch configuration, and security
configuration.
CCA allows network administrators to view the status of devices and monitor the network from
three perspectives: the System Dashboard, Topology View, or Front Panel View. From a troubleshooting perspective, CCA can be used to consolidate system logs into a single archive, which can
then be sent to the Technical Assistance Center (TAC) for troubleshooting support. Additionally,
CCA also provides tools to perform basic network connectivity troubleshooting functions, such as
the Ping and Traceroute utilities.
In addition to data network configuration and troubleshooting utilities, CCA also provides tools
that can be used for IP Telephony (voice) service diagnostics and troubleshooting. In summation,
the capabilities and advantages provided by Cisco Configuration Assistant include the following:
•
Configuration and deployment support
•
Network management support
•
Setup wizards
54
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
•
Multiple network views
•
Simplified network reporting
•
Drag-and-drop software updates
•
Troubleshooting
NOTE: The configuration of CCA is beyond the scope of the current TSHOOT certification
exam and will not be illustrated in this guide.
Cisco Network Assistant
Cisco Network Assistant (CNA) is a network management and automation tool. Like the other
tools described in this section, CNA is a GUI-based tool that can be used to apply common services
across Cisco switches, routers, and access points. CNA is available as a free download from the
Cisco Web site. At a high level, this tool provides the following capabilities:
•
Configuration management
•
Inventory reports
•
Event notification
•
Task-based menu
•
File management
•
Drag-and-drop Cisco IOS Software upgrades
In addition, CNA can also be used for troubleshooting support, as well as for security Catalyst Express 500 series switches.
CiscoWorks LAN Management Solu on
The CiscoWorks LAN Management Solution (LMS) is comprised of several individual software
applications that can be used for the configuration and administration of campus networks. LMS
also provides monitoring and troubleshooting capabilities. The following applications are included
in the CiscoWorks LAN Management Solution suite:
•
Resource Manager Essentials (RME)
•
CiscoWorks Health and Utilization Monitor
•
Device Fault Manager (DFM)
•
Internetwork Performance Monitor (IPM)
In addition to other capabilities, Resource Manager Essentials (RME) can also be used for network
monitoring and fault information for tracking devices that are critical to network uptime. The CiscoWorks Health and Utilization Monitor provides the ability to monitor the device for performance
55
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
parameters and to report violations based on the threshold values configured. This application also
has extensive reporting capabilities.
The Device Fault Manager (DFM) can be used to monitor device faults in real-time and to determine the root cause by correlating device-level fault conditions, monitoring fault history, and
configuring e-mail, SNMP trap, and syslog notifications. Finally, the Internetwork Performance
Monitor (IPM) has the proactive ability to troubleshoot network response time, jitter, and availability. As is the case with NAM, LMS is configured using a GUI. Unlike the other tools that have been
described in this section, CiscoWorks is the only package that is not freely available for download,
as it is used to manage large enterprise networks.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Network Maintenance Fundamentals Overview
•
Network maintenance is an integral component of a network management methodology
•
Network maintenance activities are either structured or interrupt-driven (ad-hoc)
•
A structured or scheduled network maintenance approach is based on predefined plan
•
Ad-hoc maintenance activities are those that are performed when any issues arise
•
A structured maintenance approach leverages proactive monitoring
•
An ad-hoc approach increases the number of resources required to support the network
Network Maintenance Tasks
•
Network maintenance tasks are simply tasks that are performed on a day-to-day basis
•
The following is a list of common network maintenance tasks:
1. Installing, replacing or upgrading both hardware and software
2. Monitoring, tuning and optimizing the network
3. Documenting the network and maintaining network documentation
4. Securing the network from both internal and external threats
5. Planning for network upgrades, expansions, or enhancements
6. Scheduling backups and restoring services or the network from backups
7. Ensuring compliance with legal regulations and corporate policies
8. Troubleshooting problem reports
9. Maintaining and updating device configurations
An Overview of Network Management Models
•
Network management models are general guidelines running and maintaining a network
56
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
•
There are several network management models that are available
•
You should select the network management model best aligned with your business goals
•
Commonly referenced network management models include the following:
1. Telecommunications Management Network
2. FCAPS
3. Information Technology Infrastructure Library
4. Cisco Lifecycle Services
•
The TMN is a model defined by ITU-T for managing systems in a communications network
•
The TMN is referenced in ITU-T Recommendation M.3010
•
The TMN was originally developed to provide a framework for service providers
•
The TMA-defined four management architectures at different levels of abstraction include the
following:
1. A functional architecture
2. An information architecture
3. A physical architecture
4. A logical layered architecture
•
The TMN logical layered architecture includes an additional four layers of abstraction as follows:
1. The Business Management Layer
2. The Service Management Layer
3. The Network Management Layer
4. The Element Management Layer
•
FCAPS is the ISO TMN model and framework for network management
•
The FCAPS fault management lifecycle includes the following tasks:
1. Fault and problem detection
2. Handling and acknowledging alarms sent by devices
3. Fault and problem isolation using a filtration and correlation process
4. Fault correction and recovery
5. Tracking problems through resolution via a trouble ticketing system
•
Configuration management encompasses the management of actual device configurations
•
Configuration management encompasses the configuration change control process
•
Configuration management includes tracking and logging changes to device configurations
•
Accounting management covers methods to track usage statistics and costs
•
Performance management covers the tracking of system and network statistics
•
Performance management includes baselining, and improving performance, e.g. using QoS
57
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Performance management can provide valuable data for capacity planning
•
Security management addresses access rights that include authentication and authorization
•
Security management is concerned with securing access to network devices
•
Security management may also include additional tasks such as integrating firewalls
•
ITIL is a set of best practices for ITSM, IT development and IT operations
•
ITIL is organized into a set of texts which are defined by related functions
•
The five processes or sets defined in ITILv3 are as follows:
1. Service Strategy
2. Service Design
3. Service Transition
4. Service Operation
5. Continual Service Improvement
•
The Cisco PPDIOO model encompasses all steps from network vision to optimization
•
PPDIOO stands for prepare, plan, design, implement, operate, and optimize
IOS Maintenance and Monitoring Tools
•
Cisco provides a plethora of tools that can be used for both maintenance and monitoring
•
The EEM is part of the Cisco Embedded Automation Systems (EASy) toolkit
•
The EASy toolkit combines the following embedded management technologies with EEM:
1. Cisco IP Service Level Agreements (IP SLAs)
2. Expression MIB
3. Network-Based Application Recognition (NBAR)
4. Flexible NetFlow
5. Enhanced Object Tracking
6. Cisco IOS Shell (IOS.sh)
•
EEM is a powerful and flexible subsystem that provides real-time network event detection
•
EEM provides onboard automation and increases the intelligence of network devices
•
EEM supports of the use of scripts which can be configured using the CLI or using Tcl
•
IOS IP SLA allows you to monitor, analyze and verify IP service levels for IP applications
•
IOS IP SLA uses active traffic monitoring to measure network performance
•
IOS IP SLA measures and monitors performance metrics like jitter, latency, and packet loss
•
IOS IP SLA is comprised of two components, which are the source (agent) and the target
•
IOS IP SLA operations can be broadly categorized into the following five functional areas:
1. Availability monitoring
2. Network monitoring
3. Application monitoring
58
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
4. Voice monitoring
5. Video monitoring
•
Logging messages and events both locally and to a syslog server is a core maintenance task
•
Syslog allows a host to send event notification messages across IP networks
•
Syslog messages are sent to event collectors called syslog servers or syslog daemons
•
A syslog daemon or server is an entity that listens to the syslog messages that are sent to it
•
Syslog uses User Datagram Protocol (UDP) as the underlying transport mechanism
•
The syslog client sends messages to the syslog sever, specifying a destination port of 514
•
Syslog messages cannot exceed 1,024 bytes in size
•
Syslog messages contain three distinct parts, which are the priority, header, and message
•
When configuring logging, synchronize the device clock manually or using NTP
•
The Simple Network Management Protocol, SNMP, is a widely used management protocol
•
SNMP can be used to collect statistics, monitor device performance and for baselining
•
SNMP is an Application Layer (Layer 7) protocol
•
SNMP uses UDP as the Transport layer protocol, using UDP ports 161 and 162
•
An SNMP network consists of a management system, agents, and managed devices
•
The management system executes monitoring applications and controls managed devices
•
An SNMP agent resides on each managed device
•
SNMP agents capture data from Management Information Bases (MIBs)
•
A managed element, such as a router, switch, or firewall, is accessed via the SNMP agent
•
Managed devices are monitored and controlled using read, write and trap commands
•
The read command is used by an NMS to monitor managed devices
•
The write command is used by an NMS to control managed devices
•
SNMP trap command is used by managed devices to report events to the NMS
•
Devices can be configured to send SNMP traps or informs to an NMS
•
SNMP traps are messages that alert the SNMP manager of a condition on the network
•
SNMP informs are SNMP traps that include confirmation of receipt from the manager
•
There are three versions of SNMP, which are versions 1, 2, and 3
•
SNMPv1 is widely used and is the de facto network-management protocol
•
SNMPv2 revises SNMPv1 and includes improvements to the original SNMPv1 standard
•
SNMPv3 provides additional security services not available in previous versions
•
Cisco IOS NetFlow is a powerful maintenance and monitoring tool
•
Cisco IOS NetFlow reports on traffic statistics, e.g. packets and bytes
•
The device on which NetFlow is configured sends out information that it has collected
•
Cisco IOS NetFlow has the ability to differentiate between traffic flows
•
An IP flow is based on a set of 5, and up to 7, IP packet attributes, which may include the following:
1. Destination IP address
59
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
2. Source IP address
3. Source port
4. Destination port
5. Layer 3 protocol type
6. Class of Service
7. Router or switch interface
•
Cisco IOS NetFlow stores flow information in the NetFlow cache or simply the flow cache
•
Collected NetFlow data can be access via the CLI or using a NetFlow Collector
•
NBAR is an intelligent classification engine in Cisco IOS software
•
Network Based Application Recognition that can recognize a wide variety of applications
•
The NBAR Protocol Discovery (PD) feature can collect application and protocol statistics
•
NBAR uses PDLMs for protocol and application recognition
•
The use of PDLMs allows NBAR to recognize additional protocols and applications
•
The configuration archive feature allows configs to be saved in the configuration archive
•
The configuration replace and configuration rollback allows for configuration rollbacks
•
The Command Scheduler allows you to run exec commands on a regular basis on a router
•
The IOS Command Scheduler has 2 processes: policy lists and the scheduler
•
Policy lists contain the exec commands that you want to be executed on the router
•
The scheduler is used to configure when these commands will be run
Addi onal Maintenance and Monitoring Tools
•
In addition to IOS tools, Cisco provides the following maintenance and monitoring tools:
1. Cisco Router and Security Device Manager
2. Cisco Configuration Professional
3. Cisco Configuration Assistant
4. Cisco Network Assistant
5. CiscoWorks LAN Management Solution (LMS)
•
SDM is a Web-based (GUI) device-management tool for Cisco access routers
•
SDM can be used for monitoring and troubleshooting tasks
•
CCP is also a GUI based device management tool for Cisco access routers
•
CCP can be used for network monitoring and maintenance tasks
•
CCA is used for the Cisco Smart Business Communications System
•
CCA is a Web-based (GUI) tool
•
CCA includes the System Dashboard, Topology View, or Front Panel View
•
CNA is a GUI-based tool
•
CNA can be used to apply common services across switches, routers, and access points
60
C H A P T E R 1: N E T WO R K M O N I TO R I N G A N D M A I N T E N A N C E
•
The CiscoWorks LAN Management Solution is comprised of several software applications
•
LMS provides monitoring, and troubleshooting capabilities
•
CiscoWorks LMS can be used for the configuration and administration of campus networks
•
CiscoWorks LMS includes the following software applications:
1. Resource Manager Essentials (RME)
2. CiscoWorks Health and Utilization Monitor
3. Device Fault Manager (DFM)
4. Internetwork Performance Monitor (IPM)
61
CHAPTER 2
Troubleshoo ng
Methodologies and Tools
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
I
n the previous chapter, we discussed network maintenance and monitoring methodologies and
tasks, as well as the tools that are available in both Cisco IOS software and standalone products
that can be used to facilitate network maintenance and monitoring. In this chapter, we will discuss
troubleshooting methodologies and processes, including the pros and cons of using these different
approaches in any given situation. The TSHOOT certification exam objectives that are covered in
this chapter include the following:
•
Isolate sub-optimal internetwork operation at the correctly defined OSI Model layer
•
Troubleshoot Multi Protocol system networks
Troubleshooting is an integral component of overall network management. Without systematic
and structured approaches, troubleshooting can very quickly become a frustrating and time-consuming process. As a network engineer, it is important to understand not only the tools that are
available for troubleshooting networks but also the methodologies and best practices that should
be applied when troubleshooting different problem reports, with the understanding that there is
no one method or tool that can be applied in all situations. For this reason, it is important to have a
solid understanding of the different methodologies and tools and how they may be applied to your
given network problem. This chapter will be divided in the following sections:
•
Troubleshooting and the Troubleshooting Flow
•
Communication and Troubleshooting
•
Integrating Maintenance and Troubleshooting
•
Troubleshooting Methodologies
•
The Cisco IOS Generic Troubleshooting Toolkit
•
Additional Troubleshooting Tools
TROUBLESHOOTING AND THE TROUBLESHOOTING FLOW
While there is no single specific definition for the term, in general, troubleshooting can be thought
of as the process of identifying or diagnosing a problem. Following diagnosis, a resolution is typically implemented to rectify or correct the problem. While the general idea behind troubleshooting
is that it begins after a problem or an issue has been reported (i.e., it is a reactive process), it is important to understand that with effective proactive monitoring, it is possible to identify and resolve
potential problems and resolve them before they impact users.
As was stated in the introduction, there is no single troubleshooting method that can be applied
to all situations; hence, the reason for the different troubleshooting methodologies that will be
described in the following section. Despite the different approaches used in these methodologies,
64
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
they all fit into the same basic high-level, three-step troubleshooting flow, which is comprised of
the following phases:
•
The problem report
•
Problem diagnosis
•
Problem resolution
These three phases in the basic troubleshooting flow process are illustrated in Figure 2-1 below:
Problem Report
Problem Diagnosis
Problem Resolution
Fig. 2-1. High-Level Steps in the Troubleshooting Flow
The problem report is what typically initiates the troubleshooting process. The objective of this
phase is to define the problem. While most users report a problem when it occurs or is experienced,
this is not always the case. In some cases, a problem could be reported hours or even days after the
fact; it is not uncommon for users to report a problem that has been ongoing for an extended period of time. As an example, a user could call in to the help desk and report that an application has
been experiencing intermittent connectivity issues for some time, and while the user previously tolerated it, the problem has now degraded to the point at which the user would like to get it resolved.
Following the problem report, the very first step of the troubleshooting process itself is problem diagnosis. The problem diagnosis phase is the most time-consuming phase. This phase is comprised
of five general, but integral, steps, all of which are critical to the overall success of the problem resolution effort. The five steps included in the problem diagnosis phase are as follows:
1. Collecting information about the problem
2. Analyzing or examining the collected information
3. Eliminating possible causes
4. Hypothesizing or theorizing potential causes
5. Verifying the hypothesis or theory
The first step included in the problem diagnosis phase entails collecting or gathering as much information about the problem as possible. This not only includes collecting information from the person reporting the problem but also includes collecting data from the network itself. The efficiency
with which the information is collected, as well as the overall quality of that information, greatly
increases the chances of successfully identifying and resolving the problem.
65
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Cisco provides a plethora of tools that can be used to gather or collect information from the network or from internetwork devices. As a network engineer, you should be intimately familiar with
these tools (which are described later in this chapter) and their capabilities. While the number of
available tools is large, data collection need not be overwhelming, as in most cases, a small subset
of the available tools typically can be used to solve most of the problems.
Once the pertinent information has been collected, it should be analyzed. Collected information
can be analyzed in numerous ways. This may entail using automated tools, referencing the collector’s experience or knowledge about the system or platform in question in order to distinguish
between normal and abnormal system behavior, or comparing the information against previously
collected baseline information gathered from monitoring tools.
When the information has been analyzed, some potential causes of the problem can be eliminated
easily. For example, if a user reported intermittent network connectivity issues, and following your
information gathering and analysis you observe port flapping on the user machine, then you can potentially rule out an overall network issue should this issue be happening only for that specific user.
After eliminating some potential causes, the next step is to hypothesize or theorize about what the
potential problem could be. Continuing with the previous intermittent connectivity example, having eliminated the network as a potential cause, the next step would be determining whether the
switch port or the patch cable between the user machine and the switch port was bad (among other
possible issues), for example.
The last step of the problem diagnosis phase is to verify your hypothesis or theory. This step may include running appropriate tests or diagnostics using the relevant tools to validate whether what you
think to be the problem is in fact the problem. Alternatively, you could also apply what you think to
be the solution and then verify whether that resolves the problem. For example, continuing with the
user intermittent connectivity issue, you could replace the patch cable between the user machine
and the switch port, or move the user network connection to another switch port, and determine
whether that resolves the issue. For simplicity, the steps involved in the problem diagnosis phase
are illustrated in Figure 2-2 below:
Collect Information
Analyze Information
Verify Hypothesis
Eliminate Causes
Hypothesize Causes
Fig. 2-2. Steps within the Problem Diagnosis Phase
66
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
While these steps present a structured approach to network troubleshooting, it is important to
understand that the process flow may require going back and forth between different steps. Assume, for example, that after testing your proposed solution to the problem, the problem still exists. In such cases, you may need to go back and gather more information and proceed through
the entire cycle again. It is important to keep this in mind because following a structured troubleshooting approach does not always guarantee that you will be able to identify the root cause of the
reported problem. While there are no guarantees of success, following a structured troubleshooting approach does increase the probability or likelihood of overall success in identifying and then
ultimately resolving the issue or problem.
The primary objective of the problem diagnosis phase is to reach a hypothesis for the root cause
of the problem. Following the completion of the problem diagnosis phase, the troubleshooting
process moves to the problem resolution phase. In this final phase, the troubleshooter should notify and then confirm with the user who reported the problem that the problem has indeed been
rectified. In addition to notifying the user who initiated the problem report, the troubleshooter
also should inform any other invested parties. For example, if the problem had been escalated to
management, they should be notified after the problem has been resolved and confirmation of the
same has been received.
Finally, an additional but important task is to include all relevant documentation and notes that
may be used as a reference in the future prior to closing out the trouble report. If changes were
made to the network to resolve the problem, then relevant documentation should be updated to
include these changes.
COMMUNICATION AND TROUBLESHOOTING
Effective communication is an integral component of the troubleshooting process. This entails communication with the end user, the team, and management. Effective communication is essential in
all steps of the troubleshooting process that was described in the previous section. The following
sections describe how effective communication can be used to facilitate the troubleshooting process.
During the problem report phase of the troubleshooting process, it is important to communicate
effectively with the end user (i.e., the person who is reporting the problem). Effective communication allows you to gather as much information about the problem as possible, which ultimately
facilities the troubleshooting process and can reduce the time to repair. Some examples of information to collect include how long the problem has been occurring, whether any changes were made
to the end user’s system, and so forth. Should the end user be irate, it is important to empathize with
him or her to gain his or her trust and confidence.
67
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Within the problem diagnosis phase, it is important to communicate effectively during all of the
steps included in this phase. In addition to collecting information from the end user, it is also important to communicate effectively with other groups, internal or external, from which you will
also need to acquire information. Within an organization, this may include the voice and security
teams, for example. Outside of the organization, this may include the service provider help desk or
network operations team. Effective communication ensures that the correct information needed to
troubleshoot the problem is collected.
When examining the collected information, it is often necessary to verify the validity of this information by collaborating with team members or even with other groups within the IT department.
This is applicable to the steps of eliminating possible causes, as well as verifying the hypothesis. As
a network engineer, you should be able to communicate effectively with team members and other
groups within the organization.
Communication is also critical when testing and verifying the hypothesis. Depending on the nature
of the problem, this step may result in temporary network outages or the interruption of other activities the user is performing. For this reason, it is important to ensure that not only the end user
but also all other parties that may be potentially affected by this step are advised. The communication should clearly communicate what is to be expected and how long the interruption will last.
Once completed, all relevant parties should be notified again.
Finally, during the problem resolution phase, it is important to communicate with the end user and
verify that the problem has indeed been resolved and all symptoms have disappeared. Additionally,
depending on the nature of the problem, the same should be communicated to any other groups
or individuals. If a trouble ticket was opened, detailed notes on the problem and the resolution
for the problem should be included. This information can be useful when troubleshooting similar
problems.
INTEGRATING MAINTENANCE AND TROUBLESHOOTING
As we learned in the previous chapter, a well-documented and well-maintained network is much
easier to support and troubleshoot than one with little or no documentation and that is not regularly maintained. In other words, a structured maintenance approach facilitates or simplifies the
troubleshooting and support functions of overall network management. In essence, effective troubleshooting is dependent on the tasks and tools that are part of the structured maintenance process. This section describes the ways maintenance facilitates the troubleshooting process.
68
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
Establishing a Baseline
Baselining is a process for studying the network, including devices that constitute the network, at
regular intervals. A baseline is more than just a single, randomly run report detailing the health of
the network at any given point in time; it is a continual process. Baselining facilitates the troubleshooting process by allowing troubleshooters to differentiate between normal and abnormal behavior when troubleshooting problem reports. By following the baseline process, you can perform
the following:
•
Get information on the health of the hardware and the software
•
Predict future problems
•
Determine the current utilization of network resources
•
Identify current network problems
•
Make accurate decisions about network alarm thresholds
The baselining process can be used to determine the network break point, which is the point at
which the network will break. The network break point can be determined through the knowledge
of how the hardware and the software in the network perform. This information can be used to
identify and properly plan for critical resource limitation issues in the network (i.e., capacity planning). Several tools can be used to establish a baseline, including, but not limited to, Cisco IP SLA
operations, SNMP, Cisco IOS NetFlow, and CiscoWorks LMS.
Documenta on
Documentation, a core maintenance function, is integral to the troubleshooting process. As was
stated in Chapter 1, it is important to ensure that the network is well documented and that the
documentation is accurate and well maintained. Attempting to troubleshoot a network using incorrect documentation is oftentimes worse than attempting to troubleshoot a network without
any documentation at all. As was also stated in Chapter 1, good documentation should include the
following information about the network:
•
Information about the interconnects between devices for LAN and WAN connections
•
IP addressing and VLAN information
•
A physical topology diagram of the network
•
A logical topology diagram of the network
•
An inventory of all internetwork devices, components, and modules
•
A revision control section detailing changes to the topology
•
Configuration information
•
Any original or additional design documentation and notes
•
Data or traffic flow patterns
69
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The maintenance of network documentation and configuration can be facilitated by using some of
the tools described in Chapter 1. Examples of tools that can be used to facilitate the documentation
maintenance function include the Embedded Event Manager (EEM), the Configuration Archive
and Rollback feature, and the Configuration Scheduler (KRON).
Change Management
Change management, or change control, is an integral component of the network maintenance
process. The objective of the change management process is to minimize network and service
downtime by ensuring that requests for changes are recorded and then evaluated, authorized, prioritized, planned, tested, implemented, documented, and reviewed in a controlled and consistent
manner. A change can involve any configuration item or element of infrastructure. Some examples
of changes include the following:
•
Environmental changes
•
Network changes
•
Application changes
•
Hardware changes
•
Documentation changes
•
Software changes
Ensuring that changes to network devices are performed in a controlled, well-documented manner reduces the likelihood of unplanned network outages. Troubleshooting is also simplified when
changes are performed in a controlled environment. The change control process also ensures that
network documentation is updated following the change.
TROUBLESHOOTING METHODOLOGIES
In the previous sections, we discussed the basic troubleshooting flow, detailing the steps in the
problem diagnosis phase, which is the most time-consuming phase of the troubleshooting process.
In addition, we also discussed the importance of effective communication and the ways in which
it facilitates the troubleshooting process. Finally, we examined how a structured maintenance approach also facilitates the troubleshooting process.
In this section, we will learn about some common troubleshooting approaches that can be applied when troubleshooting problem reports. Before we delve into these structured approaches,
however, it is important to understand the implications of not using a structured troubleshooting
approach when troubleshooting a network problem.
70
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
One of the most common mistakes made by engineers is not following some kind of structured
troubleshooting approach when attempting to identify the root cause of a problem. While it is true
that ad-hoc troubleshooting may eventually produce the desired objective, such an approach is
unpredictable and is often very inefficient. One of the most commonly employed troubleshooting
techniques, especially by experienced troubleshooters, is the shoot-from-the-hip troubleshooting
approach. With this method, after the troubleshooter has collected information, he or she leverages his or her intimate knowledge of the network or calls on experience (past or present), and then
immediately implements a change in the hope that the change he or she implemented will resolve
the issue.
The primary problem with this approach is that while it may work for seasoned engineers who can
call on their experience or knowledge of the network, for example, it does not work for inexperienced engineers. A structured, systematic approach, on the other hand, will reduce the amount
of time the troubleshooter spends on the problem. In addition, a structured approach increases
the efficiency of the overall troubleshooting process itself. The shoot-from-the-hip troubleshooting
method is illustrated in Figure 2-3 below:
Problem Report
Collect Information
Resolve Problem
Hypothesize Causes
Verify Hypothesis
Fig. 2-3. The Shoot-from-the-Hip Troubleshooting Method
As was previously stated, there is no one single way to troubleshoot. Different problems call for different approaches. However, regardless of the approach used, it is important to adhere to a structured
troubleshooting approach. Common structured troubleshooting methods include the following:
•
The top-down troubleshooting method
•
The bottom-up troubleshooting method
•
The follow-the-traffic-path troubleshooting method
•
The compare configurations troubleshooting method
•
The divide and conquer troubleshooting method
•
The component-swapping troubleshooting method
These troubleshooting methods will be described in the following sections.
71
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The Top-Down Troubleshoo ng Method
When using the top-down troubleshooting approach, the troubleshooter begins troubleshooting
at the Application layer of the OSI Model and works his or her way down to the Physical layer.
This approach works best when you believe that the problem resides within an application and not
within the network or internetwork devices. For example, if a user reports that he or she cannot access a particular server but is able to ping the server IP address, then it can be assumed that Layers
3 through 1 are working fine because there is IP connectivity between the user’s machine and the
server. The troubleshooting process would therefore begin at the Application layer.
The Bo om-Up Troubleshoo ng Method
When using the bottom-up troubleshooting approach, the troubleshooter starts troubleshooting at
the Physical layer of the OSI Model and works his or her way up to the Application layer. This approach is based on the assumption that the problem resides at the lower half of the OSI Model. The
bottom-up troubleshooting approach is efficient and is one of the most commonly used troubleshooting methods. However, while it works well in smaller networks, it is typically inefficient in larger networks, as it becomes more difficult to discover which network device is actually causing the problem.
The Follow-the-Traffic-Path Troubleshoo ng Method
The follow-the-traffic-path troubleshooting method requires intimate knowledge of the network,
as well as the traffic flows, which, if following best practices, should be included in network documentation. This troubleshooting approach is based on the path that the traffic or packets will take
through the network. A common practice when collecting information is to request a traceroute
from the user reporting the problem. The troubleshooter can then use this troubleshooting method
to eliminate internetwork devices based on the path the traffic takes.
The Compare Configura ons Troubleshoo ng Method
The compare configurations, or spot-the-difference, method entails comparing the configuration
on the current device with an older or archived version of the configuration that had been confirmed to be working. Another approach that is also commonly used is to compare device configurations with that of another similarly configured device that is working.
The Divide and Conquer Troubleshoo ng Method
The divide and conquer troubleshooting method begins at the Network layer of the OSI Model and
then goes either up or down the stack, depending on the results of the test. For example, assume
that a user reports that he or she is unable to access a particular server. Using this approach, if a
ping to the server IP address was successful, the troubleshooter would begin the troubleshooting
process at the top of the OSI stack. On the other hand, if the ping failed, then the troubleshooter
would begin the troubleshooting process at the bottom of the OSI stack.
72
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
The divide and conquer troubleshooting method also works well when several troubleshooters are
working on the same problem. Once all possible causes of the problem have been hypothesized,
individual troubleshooters can be asked to test and verify individual hypotheses. The advantage of
using this approach when multiple troubleshooters are all working on the same problem is that it
increases efficiency and reduces the likelihood that two or more people are doing the same thing
(i.e., a duplication of effort) while other aspects are being neglected.
The Component-Swapping Troubleshoo ng Method
The component-swapping troubleshooting method entails the replacement of components and
observing whether the problem moves with the components. For example, referencing the user
intermittent network connectivity example used at the beginning of this chapter, if after replacing
the network cable the user is still experiencing issues, the next step would be to move the user to
another switch port. If that does not resolve the issue, the workstation NIC card could be replaced
next, and so forth. If the problem disappears after a component is replaced, for example the network cable, then it can be concluded that the component is faulty.
THE CISCO IOS GENERIC TROUBLESHOOTING TOOLKIT
As we learned in the previous section, collecting information is a core component of the overall
troubleshooting process. The information that is collected comes from both users and internetwork devices. As a network engineer, it is important to have an intimate understanding of the Cisco
IOS commands that can be used to collect information that will assist in the troubleshooting process. This section describes generic commands and utilities used in the troubleshooting process.
It does not delve into technology or protocol-specific troubleshooting commands, as those will be
described in detail later in this guide. The following topics are included in this section:
•
Troubleshooting network connectivity
•
Troubleshooting hardware
•
Troubleshooting Cisco IOS Services Diagnostics
•
Filtering command output
•
Redirecting output
•
Monitoring and capturing packets
•
Health monitoring
Troubleshoo ng Network Connec vity
Two of the most commonly used utilities when troubleshooting connectivity issues are the ping
and traceroute utilities. Cisco IOS supports both standard and extended ping and traceroute options. The primary difference between the standard and extended versions is additional capabilities
73
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
that are included in the extended options, which may include specifying Quality of Service parameters when executing the command, for example.
In Cisco IOS software, a standard ping is initiated using the ping [hostname | address] privileged EXEC command. Additional keywords can also be used in conjunction with this command.
However, the available options will differ depending on the version of Cisco IOS software that is
running on the platform. The following output displays additional options available with the standard ping command in Cisco IOS software version 12.4 Mainline:
R2#ping 10.0.0.1 ?
data
specify data pattern
df-bit
enable do not fragment bit in IP header
repeat
specify repeat count
size
specify datagram size
source
specify source address or name
timeout
specify timeout interval
validate validate reply data
<cr>
NOTE: These same options are also available and can be used when issuing an extended ping.
The data keyword is used to specify the data pattern to use when executing the ping. Different data
patterns are used to troubleshoot framing errors and clocking problems on serial lines. The default
is 0xABCD (Hexadecimal). The current TSHOOT certification exam does not require you to go
into specifics on the different available supported data patterns. The df-bit keyword is used to enable the Don’t Fragment (DF) bit in outgoing ping packets.
If the DF bit is set, then packets will not be fragmented when such packets must traverse segments
with smaller maximum transmission unit (MTU) values. In such cases, the command will print an
error message that is received from the device that wanted to fragment the packet. This error message is denoted by the value ‘M’ in Cisco IOS software. Setting the DF bit in outgoing ping packets is
used to determine the smallest MTU in the path to a destination. This can be used to troubleshoot
connectivity issues between Transmission Control Protocol (TCP)-based applications across Virtual Private Networks (VPNs), for example. By default, the DF bit is not set when using the Cisco
IOS ping command.
The repeat keyword is used to specify the number of ping packets that the software will send to
the destination address; the default is 5. The size keyword is used to specify the packet size, in
bytes, of the ping packets. By default, the Cisco IOS ping command will send 100-byte packets.
The source keyword allows you to specify a source interface for the ping packets. By default, the
software will use the interface that is used to reach the specified destination.
74
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
The timeout keyword is used to specify the ping packet timeout value. A ping packet that is not
received within the interval specified using this keyword is considered unsuccessful. By default,
Cisco IOS software specifies a timeout value of two seconds. Finally, the validate keyword is used
to specify whether to validate the reply data. The default is no, meaning that the router that originated the Internet Control Message Protocol (ICMP) echo packet will not check the ICMP-Cyclic
Redundancy Check (CRC) in the echo reply or response packet.
The following example illustrates the use of the df-bit and repeat keywords when issuing a standard ping. The DF bit is set to prevent the packet from being fragmented. The repeat keyword is
used to specify that 50 ping packets should be sent to the destination of 192.168.1.1:
R2#ping 192.168.1.1 df-bit repeat 50
Type escape sequence to abort.
Sending 50, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (50/50), round-trip min/avg/max = 4/4/5 ms
The following example illustrates the use of the source and timeout keywords when sending a
standard ping. The source keyword is used to specify a source IP address, although an interface
can also be specified, and the timeout keyword is used to specify a timeout value of one second,
which will constitute a successful ping response:
R2#ping 1.1.1.1 source 2.2.2.2 timeout 1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 1 seconds:
Packet sent with a source address of 2.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/13/56 ms
While the exclamation mark (!) or the period (.) are the most commonly seen output characters
from the ping utility, as a troubleshooter, you should be familiar with some additional characters
that the ping utility may print. Table 2-1 below lists and describes the characters that may be printed by the ping utility:
75
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Table 2-1. Cisco IOS Ping Utility Characters
Character
Description
!
.
U
Q
M
?
&
Each exclamation point indicates receipt of a reply
Each period indicates the network server timed out while waiting for a reply
A destination unreachable error PDU was received
Source quench (destination too busy)
Could not fragment
Unknown packet type
Packet lifetime exceeded
The following example illustrates the output of the ping utility indicating that the destination address specified is unreachable:
R2#ping 1.1.1.1 repeat 10 size 500
Type escape sequence to abort.
Sending 10, 500-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
U.U.U.U.U.
Success rate is 0 percent (0/10)
You typically would see this error message when ICMP packets are being blocked by an ACL or
similar filter. The following example displays the output of the ping utility indicating that the ping
was unsuccessful because fragmentation is required; however, the ping packets could not be fragmented because the Don’t Fragment (DF) bit was set:
R1#ping 192.168.2.1 size 1500 df-bit
Sending 5, 1400-byte ICMP Echos to 192.168.2.1, timeout is 2 seconds:
Packet sent with the DF bit set
M.M.M
Success rate is 0 percent (0/5)
This error message is commonly seen when troubleshooting MTU issues across VPN tunnels,
which have lower MTU values (due to the encapsulation overhead) than physical interfaces have.
Solutions to this error message include the following:
•
Manually changing the MTU values on internetwork devices
•
Changing the TCP Maximum Segment Size (MSS)
•
Using or enabling the Path MTU Discovery (PMTUD) feature
These concepts will be described in additional detail when we discuss specific troubleshooting scenarios pertaining to VPNs later in this guide. The Cisco IOS extended ping utility includes the same
options available with the standard ping utility, plus additional options. These additional options
76
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
include the ability to perform a ping sweep, allowing you to vary the size of the echo packets that are
sent. A ping sweep can be used to determine the minimum MTU size between source and destination. This concept will be illustrated in detail later in this guide.
An additional capability that is provided by extended pings is the ability to include IP header options in the ping packets. These options include the Record, Loose, Strict, and Timestamp options.
The Record option prints the addresses of up to nine hops traversed by the packet. The Loose option allows you to influence the path by specifying the address(es) of the hop(s) you want the packet
to go through.
The Strict option is used to specify the hop(s) that you want the packet to go through, but no other
hop(s) are allowed to be visited. Finally, the Timestamp option is used to measure round-trip time
to particular hosts. The following example illustrates the use of the extended ping utility, highlighting some of the options available with this command:
R2#ping ip
Target IP address: 200.1.1.1
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]: 1
Extended commands [n]: y
Source address or interface: FastEthernet0/0
Type of service [0]:
Set DF bit in IP header? [no]: y
Validate reply data? [no]: y
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: r
Number of hops [ 9 ]: 4
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 200.1.1.1, timeout is 1 seconds:
Packet sent with a source address of 150.1.1.2
Packet sent with the DF bit set
Reply data will be validated
Packet has IP options: Total option bytes= 19, padded length=20
Record route: <*>
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
(0.0.0.0)
Reply to request 0 (1 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(200.1.1.2)
(200.1.1.1)
77
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
(10.0.0.1)
(150.1.1.2)
<*>
End of list
Reply to request 1 (1 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(200.1.1.2)
(200.1.1.1)
(10.0.0.1)
(150.1.1.2)
<*>
End of list
Reply to request 2 (4 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(200.1.1.2)
(200.1.1.1)
(10.0.0.1)
(150.1.1.2)
<*>
End of list
Reply to request 3 (4 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(200.1.1.2)
(200.1.1.1)
(10.0.0.1)
(150.1.1.2)
<*>
End of list
Reply to request 4 (4 ms). Received packet has options
Total option bytes= 20, padded length=20
Record route:
(200.1.1.2)
(200.1.1.1)
(10.0.0.1)
(150.1.1.2)
<*>
End of list
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
NOTE: By default, when any IP options are specified, the verbose keyword is automatically
included by Cisco IOS software and does not need to be specified manually.
78
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
While the ping utility is a useful tool for verifying connectivity between devices, the traceroute utility is a powerful tool for discovering the path packets take to a remote destination. While it is true
that the ping command can also print the path the path packets take to a remote destination, up to
a maximum of nine hops, the traceroute utility can provide information on where the routing (i.e.,
the path between the source and destination network) is broken.
Traceroute records the source of each ICMP Time Exceeded message, which allows this tool to
provide a trace of the path packets take to reach the specified destination. When using traceroute,
the device on which this command is initiated will send out a sequence of User Datagram Protocol
(UDP) packets, each with incrementing time to live (TTL) values, to UDP port 33434 at the remote
host. This default port number can be changed in some applications.
When the traceroute command is initiated, three packets with an IP TTL value of 1 are sent. This
IP TTL value causes the packets to time out as soon as they are received by the first gateway device,
which responds with an ICMP Time Exceeded message. Next, three more packets are sent out. This
time, however, the packets have an IP TTL value of 2, which causes the second gateway in the path
to return an ICMP Time Exceeded message.
This process is repeated until the packets reach the destination address and the initiating device
has received an ICMP Time Exceeded message from all gateways in the path to the specified destination. Because the packets are sent to UDP port 33434, which is an invalid port number, the
destination host responds with an ICMP Port Unreachable message when it receives these packets.
Receipt of the Port Unreachable message tells the traceroute command to end.
NOTE: If the no ip unreachables command has been configured on the interfaces of any
internetwork devices between the source and the destination, then the traceroute command
will not work.
As is the case with the ping utility, Cisco IOS supports both standard and extended traceroute
commands. Following are the options available with the standard traceroute command:
R2#traceroute 10.1.1.1 ?
numeric display numeric address
port
specify port number
probe
specify number of probes per hop
source
specify source address or name
timeout specify timeout interval
ttl
specify minimum and maximum ttl
<cr>
79
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The numeric keyword is used to suppress the symbolic display (i.e., hostnames) in the output of the
traceroute command. By default, Cisco IOS software defaults to a symbolic and numeric display.
Including a symbolic display often results in the traceroute option running very slowly due to the
fact that the software attempts to resolve the IP addresses to hostnames.
The port keyword is used to specify the destination port that will be used by the traceroute UDP
probe messages. As was stated in the previous section, this defaults to UDP port 33434. The probe
keyword is used to specify the number of probes to be sent at each TTL level. By default, the software will send three probes.
The source keyword is used to specify a source interface or IP address. By default, the router will
use the interface that is learned via the route to the destination network. The timeout keyword is
used to specify the number of seconds to wait for a response to a probe packet; the default is three
seconds. Finally, the ttl keyword is used to specify the IP TTL value of the probes. The following
example illustrates a typical traceroute command output:
R2#traceroute 192.168.1.1 source FastEthernet0/0
Type escape sequence to abort.
Tracing the route to 192.168.1.1
1 R2-Se0-Interface (10.0.2.2) 4 msec 4 msec 4 msec
2 R1-Fa0-Interface (10.0.0.1) 4 msec 4 msec 4 msec
3 R5-Se0-Interface (10.0.9.5) 0 msec * 4 msec
The following output shows the same output with the symbolic display suppressed:
R2#traceroute 192.168.1.1 numeric
Type escape sequence to abort.
Tracing the route to 192.168.1.1
1 10.0.2.2 4 msec 4 msec 4 msec
2 10.0.0.1 4 msec 4 msec 4 msec
3 10.0.9.5 0 msec * 4 msec
The primary difference between standard and extended traceroute execution is that extended traceroute allows users to specify IP options (i.e., the Loose, Strict, and Record options) when performing the traceroute. All other options that can be used with the standard traceroute option apply
and can be used when performing an extended traceroute option. Similar to the ping utility, the
traceroute utility also prints different characters, each indicating the result of the operation. Table
2-2 below lists and describes the result characters that may be printed by the traceroute utility:
80
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
Table 2-2. Cisco IOS Traceroute Utility Characters
Character
nn msec
*
A
Q
I
U
H
N
P
T
?
Description
The round-trip time in milliseconds for the specified number of probes
The probe timed out
Administratively prohibited (e.g., access list)
Source quench (destination too busy)
User interrupted test
Port unreachable
Host unreachable
Network unreachable
Protocol unreachable
Timeout
Unknown packet type
The following example illustrates the output of a traceroute command that is administratively
prohibited (i.e., by an ACL or similar filter that has been applied to the gateway interface):
R2#traceroute 192.168.1.1 source FastEthernet0/0
Type escape sequence to abort.
Tracing the route to 192.168.1.1
1 R1-Tu0-Interface (200.1.1.1) !A
*
0 msec
Troubleshoo ng Hardware
The Cisco IOS diagnostic toolkit includes Cisco Generic Online Diagnostics (GOLD) and Cisco
IOS Service Diagnostics. Running diagnostics is an inherent part of maintenance, as well as troubleshooting. For example, diagnostics may be run on newly purchased hardware to ensure and validate functionality before it is integrated into the existing network. Additionally, diagnostics could
also be run on hardware while troubleshooting a problem report, in an attempt to isolate issues.
Cisco GOLD defines a common framework for diagnostics operations across different platforms
running Cisco IOS software. The GOLD framework specifies the platform-independent fault-detection architecture for centralized and distributed systems. This includes the common diagnostics
CLI and the platform-independent fault-detection procedures for boot-up and runtime diagnostics. When boot-up diagnostics detects a failure on a Cisco Catalyst 6500 Series switch, for example, the failing modules are shut down.
Should diagnostics fail, you can open up a trouble report or ticket with the Technical Assistance
Center (TAC) to perform additional troubleshooting for the specified module(s) or initiate the
81
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
process of replacing the faulty hardware. By default, Catalyst 6500 Series switches run minimal
diagnostics at boot-up; however, this default diagnostics behavior can be changed using the diagnostic bootup level [minimal | complete] global configuration command to allow the
switch to run complete (full) diagnostics at boot-up instead.
Once the switch has completed running boot-up diagnostics, you can use the show module command to verify the status of the diagnostic testing as illustrated in the following output:
Catalyst-6500-Core-Switch#show module
Mod Ports Card Type
Model
Serial No.
--- ----- ------------------------------- ------------------ -------------5
2 Supervisor Engine 720 (Active) WS-SUP720-BASE
SAD18140BKZ
Mod MAC addresses
Hw
Fw
Sw
Status
--- ---------------------------------- ------ -------- ------------ ------5 000d.bda3.eaa8 to 000d.bda3.eaab
3.1
8.5(1)
12.2(18)SXF1 Ok
Mod
---5
5
Sub-Module
--------------------------Policy Feature Card 3
MSFC3 Daughterboard
Model
Serial
Hw
Status
-------------- ----------- ------- ------WS-F6K-PFC3A
SAD08150810 2.2
Ok
WS-SUP720
SAD0815048A 2.2
Ok
Mod Online Diag Status
---- ------------------5 Pass
The show diagnostic result command can be used to view detailed testing information about
the boot-up diagnostics for all modules as illustrated in the following output:
Catalyst-6500-Core-Switch#show diagnostic result
Current bootup diagnostic level: complete
Module 5: Supervisor Engine 720 (Active)
SerialNo : SAD18140BKZ
Overall Diagnostic Result for Module 5 : PASS
Diagnostic level at card bootup: complete
Test results: (. = Pass, F = Fail, U = Untested)
1) TestScratchRegister -------------> .
2) TestSPRPInbandPing --------------> .
3) TestTransceiverIntegrity:
Port 1 2
---------U U
4) TestActiveToStandbyLoopback:
82
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
Port 1 2
---------U U
5) TestLoopback:
Port 1 2
---------. .
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
TestNewIndexLearn --------------->
TestDontConditionalLearn -------->
TestBadBpduTrap ----------------->
TestMatchCapture ---------------->
TestProtocolMatchChannel -------->
TestFibDevices ------------------>
TestIPv4FibShortcut ------------->
TestL3Capture2 ------------------>
TestIPv6FibShortcut ------------->
TestMPLSFibShortcut ------------->
TestNATFibShortcut -------------->
TestAclPermit ------------------->
TestAclDeny --------------------->
TestQoSTcam --------------------->
TestL3VlanMet ------------------->
TestIngressSpan ----------------->
TestEgressSpan ------------------>
TestNetflowInlineRewrite:
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Port 1 2
---------. .
24)
25)
26)
27)
28)
29)
30)
31)
32)
TestFabricSnakeForward ---------->
TestFabricSnakeBackward --------->
TestTrafficStress --------------->
TestFibTcamSSRAM ---------------->
TestAsicMemory ------------------>
TestNetflowTcam ----------------->
ScheduleSwitchover -------------->
TestFirmwareDiagStatus ---------->
TestFabricFlowControlStatus ----->
.
.
U
U
U
U
U
.
U
NOTE: You are not required to configure GOLD for the current TSHOOT certification exam.
In addition to boot-up diagnostics, you can also run on-demand diagnostics or even schedule diagnostic testing to be run at a specific time. These configurations, however, are beyond the scope of
the current TSHOOT certification exam and are not described any further in this guide.
83
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Troubleshoo ng Cisco IOS Service Diagnos cs
In addition to GOLD, Cisco IOS Service Diagnostics allows for diagnostic testing and tools for
common problems in Border Gateway Protocol (BGP), Open Shortest Path First (OSPF) Protocol,
and Quality of Service (QoS), as well as tools to monitor and detect abnormal system resource utilization, using scenario-specific troubleshooting scripts.
Unlike GOLD, Cisco IOS Service Diagnostics is a programmable diagnostics service that also leverages other Cisco IOS software capabilities, specifically the Embedded Event Manager (EEM),
Embedded Syslog Manager (ESM), and Tool Command Language (Tcl). As is the case with GOLD,
the configuration of Cisco IOS Service Diagnostics is beyond the scope of the current TSHOOT
certification exam and will not be described in further detail in this guide.
Filtering Command Output
While most engineers are familiar with the show command suite that is available in Cisco IOS
software, not all of them are aware of the filtering capabilities integrated into the software. Filtering
or restricting command output to only the information you will need to troubleshoot a particular
problem not only demonstrates your level of knowledge about the capabilities of Cisco IOS software but also saves time, which can result in a speedier resolution of the problem.
In Cisco IOS software, output filtering is performed by appending the pipe (|) symbol, which then
allows the output to be acted on according to the specified logic, which includes the begin, exclude, include, and section keywords. Following these keywords, the desired output can be
matched against regular expressions. Table 2-3 below lists and describes some of the commonly
used regular expressions when filtering command output:
Table 2-3. Cisco IOS Regular Expressions
Character
^ (caret)
$ (dollar sign)
. (period)
| (pipe)
_ (underscore)
+ (plus)
* (asterisk)
? (question mark)
Description
Indicates the beginning of a string
Indicates the end of a string
Indicates any character
Specifies an either-or operation
Matches a comma, braces, parentheses, the beginning of the string, the
end of the string, or a space
Matches 1 or more sequences of the pattern
Matches 0 or more sequences of the pattern
Matches 0 or 1 occurrences of the pattern
84
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
The begin keyword filters the output beginning at the specified phrase. The following example illustrates how to filter the output of the current (running) configuration so that the output begins at
the Border Gateway Protocol (BGP) configuration that is implemented on the router:
R2#show running-config | begin router bgp
router bgp 1
no synchronization
bgp router-id 1.1.1.1
bgp log-neighbor-changes
network 150.1.1.0 mask 255.255.255.0
network 150.2.2.0 mask 255.255.255.0
network 150.3.3.0 mask 255.255.255.0
neighbor 10.0.0.1 remote-as 2
default-information originate
no auto-summary
!
ip forward-protocol nd
ip route 1.1.1.1 255.255.255.255 Serial0/0
ip route 192.168.1.0 255.255.255.0 Tunnel0
!
!
ip http server
no ip http secure-server
!
logging 192.168.1.254
logging 150.1.1.254
snmp-server community public RW
snmp-server community readonlypassword RO
...
[Truncated Output]
The exclude keyword is used to exclude the specified phrases or regular expressions from the
command output. The following example illustrates how the exclude keyword can be used to filter
the output of the show ip interface brief command so that only interfaces that are not administratively shut down are displayed in the output of the command:
R2#show ip interface brief | exclude administratively down
Interface
IP-Address
OK? Method Status
FastEthernet0/0
150.1.1.2
YES NVRAM up
Serial0/0
10.0.0.2
YES NVRAM up
Loopback0
2.2.2.2
YES NVRAM up
Tunnel0
200.1.1.2
YES manual up
Protocol
up
up
up
up
The include keyword includes the specified phrase or regular expression in the output of the
command. If the specified phrase is part of a line, then the entire line is displayed. The following
example illustrates how to show information on errors for a WAN interface. Notice in the output
that because the specified keyword is part of a line, the entire line is printed:
85
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R2#show interfaces Serial0/0 | include error
1 input errors, 0 CRC, 1 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 4 interface resets
The section keyword prints only information that matches the specified section and no more. For
example, if the show running-configuration | begin router bgp command was specified,
then the router would parse the configuration and begin at the statement router bgp. However,
the router would also display all other configuration following this. Therefore, not only is the BGP
configuration displayed but also all other configuration after the BGP configuration is included. Using the section keyword prevents this behavior as illustrated in the following example:
R2#show running-config | section router bgp
router bgp 1
no synchronization
bgp router-id 1.1.1.1
bgp log-neighbor-changes
network 150.1.1.0 mask 255.255.255.0
network 150.2.2.0 mask 255.255.255.0
network 150.3.3.0 mask 255.255.255.0
neighbor 10.0.0.1 remote-as 2
default-information originate
no auto-summary
R2#
As seen in the output above, only the relevant matched section of the configuration is printed. As
stated earlier, regular expressions can also be used when filtering command output. The following
example illustrates how to filter the running configuration on a router so that only lines that end
with the number zero are displayed:
R2#show running-config | include 0$
ip sla monitor responder type tcpConnect ipaddress 10.0.0.2 port 80
ip sla monitor responder type tcpConnect ipaddress 2.2.2.2 port 80
frequency 30
time-period 10080
ip ftp password 7 1311041A040310
interface Loopback0
interface Tunnel0
ip mtu 1500
tunnel source Serial0/0
interface FastEthernet0/0
ip address 150.1.1.2 255.255.255.0
interface Serial0/0
clock rate 2000000
network 150.1.1.0 mask 255.255.255.0
network 150.2.2.0 mask 255.255.255.0
network 150.3.3.0 mask 255.255.255.0
ip route 1.1.1.1 255.255.255.255 Serial0/0
ip route 192.168.1.0 255.255.255.0 Tunnel0
voice-port 1/0/0
86
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
voice-port 1/1/0
port 1/0/0
line con 0
line aux 0
As a final example, the following illustrates how to filter the router output so that only lines that
include 0/0 or 0/1 are printed:
R2#show running-config | include 0/0|0/1
tunnel source Serial0/0
interface FastEthernet0/0
interface Serial0/0
interface Serial0/1
ip route 1.1.1.1 255.255.255.255 Serial0/0
voice-port 1/0/0
voice-port 1/0/1
port 1/0/0
port 1/0/1
NOTE: If you include a space between the second pipe and the 0/0 and 0/1 keywords, nothing
will be matched by the command.
Redirec ng Output
In addition to filtering command output, Cisco IOS software allows output to be redirected to an
external location or Flash memory. This is useful when executing commands that print a great deal
of information, such as the show tech-support command, for example. Output redirection can
be performed using the append, redirect, or tee keywords after the pipe.
The append keyword allows you to redirect and add the output of any show command to an existing file. You can use the show <command> | append ? command to view valid locations for the
platform on which you are working. The following example illustrates the use of this command to
view valid locations for a Cisco 2600XM Series router:
R2# show ip interface FastEthernet0/0 | append ?
ftp:
Uniform Resource Locator
nvram: Uniform Resource Locator
In the output above, valid options for the router include appending to a file stored in NVRAM or on
an FTP server. The following example illustrates how to append the output of the running configuration file to an existing file named ‘r2-config’ that is located on FTP server 150.1.1.254:
R2#show running-config | append ftp://150.1.1.254/r2-config
Writing r2-config
R2#
87
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Keep in mind that the example above assumes that both the router and the FTP server have been
configured correctly in the event that authentication is required. If credentials are required, include
these in the command as illustrated in the example below, which specifies an FTP username of
‘NETADMIN’ with an FTP password of ‘TSHOOT’ when connecting to the FTP server:
R2#show running-config | append ftp://netadmin:tshoot@150.1.1.254/r2-config
Writing r2-config
R2#
As previously stated, the redirect keyword is used to redirect the output of a command to a local or remote file location. Again, the options available will depend on the platform on which the
command is executed. The following example illustrates the options that are available on a Cisco
2600XM Series router running Cisco IOS version 12.4 Mainline:
R2#show tech-support ospf detail | redirect ?
flash: Uniform Resource Locator
ftp:
Uniform Resource Locator
http:
Uniform Resource Locator
https: Uniform Resource Locator
nvram: Uniform Resource Locator
pram:
Uniform Resource Locator
rcp:
Uniform Resource Locator
scp:
Uniform Resource Locator
tftp:
Uniform Resource Locator
The following example illustrates how to redirect the output of the show tech-support ospf
detail command to a TFTP server with the IP address 150.1.1.254. The file will be saved on the
TFTP server with the name ‘r2-ospf-tshoot’:
R2#show tech-support ospf detail | redirect tftp://150.1.1.254/r2-ospf-tshoot
!
R2#
The following example illustrates how to save the same file to the local router Flash memory:
R2#show tech-support ospf detail | redirect flash:r2-ospf-tshoot
R2#show flash:
System flash directory:
File Length
Name/status
1
29965496 c2600-adventerprisek9-mz.124-25c.bin
2
1691
r1-confg
3
3142
r2-confg
4
595
interface-statistics
5
38015
r2-ospf-tshoot
88
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
[30009264 bytes used, 3020876 available, 33030140 total]
32768K bytes of processor board System flash (Read/Write)
To view the contents of the file (if saved to Flash) you can use the more command as follows:
R2#more flash:r2-ospf-tshoot
------------------ show version -----------------Cisco IOS Software, C2600 Software (C2600-ADVENTERPRISEK9-M), Version 12.4(25c),
RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Thu 11-Feb-10 23:02 by prod_rel_team
ROM: System Bootstrap, Version 12.2(7r) [cmong 7r], RELEASE SOFTWARE (fc1)
R2 uptime is 22 hours, 47 minutes
System returned to ROM by power-on
System image file is “flash:c2600-adventerprisek9-mz.124-25c.bin”
...
[Truncated Output]
Finally, the tee keyword copies the output of any show command to a file and displays the same
content on the terminal at the same time. The following output displays the available locations for
copying the file while using the tee keyword on a Cisco 2600XM Series router:
R2#show ip ospf neighbor detail | tee ?
/append Copy and append output to URL (URLs supporting append operation
only)
flash:
Uniform Resource Locator
ftp:
Uniform Resource Locator
http:
Uniform Resource Locator
https:
Uniform Resource Locator
nvram:
Uniform Resource Locator
pram:
Uniform Resource Locator
rcp:
Uniform Resource Locator
scp:
Uniform Resource Locator
tftp:
Uniform Resource Locator
The following example illustrates how to view the contents of the show ip ospf neighbor detail command, while simultaneously copying the same output to a TFTP server with the IP ad-
dress 150.1.1.254. The file will be saved on the TFTP server as ‘r2-ospf-output’:
R2#show ip ospf neighbor detail | tee tftp://150.1.1.254/r2-ospf-output
!
Neighbor 1.1.1.1, interface address 10.0.0.1
89
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In the area 0 via interface Serial0/0
Neighbor priority is 0, State is FULL, 6 state changes
DR is 0.0.0.0 BDR is 0.0.0.0
Options is 0x12 in Hello (E-bit L-bit )
Options is 0x52 in DBD (E-bit L-bit O-bit)
LLS Options is 0x1 (LR)
Dead timer due in 00:00:31
Neighbor is up for 00:10:07
Index 1/1, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
The following example illustrates how to save the same output to Flash while viewing it:
R2#show ip ospf neighbor detail | tee flash:r2-ospf-output
Neighbor 1.1.1.1, interface address 10.0.0.1
In the area 0 via interface Serial0/0
Neighbor priority is 0, State is FULL, 6 state changes
DR is 0.0.0.0 BDR is 0.0.0.0
Options is 0x12 in Hello (E-bit L-bit )
Options is 0x52 in DBD (E-bit L-bit O-bit)
LLS Options is 0x1 (LR)
Dead timer due in 00:00:38
Neighbor is up for 00:19:50
Index 1/1, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
The more command can then be used to view the file that has been stored in Flash as follows:
R2#more flash:r2-ospf-output
Neighbor 1.1.1.1, interface address 10.0.0.1
In the area 0 via interface Serial0/0
Neighbor priority is 0, State is FULL, 6 state changes
DR is 0.0.0.0 BDR is 0.0.0.0
Options is 0x12 in Hello (E-bit L-bit )
Options is 0x52 in DBD (E-bit L-bit O-bit)
LLS Options is 0x1 (LR)
Dead timer due in 00:00:38
Neighbor is up for 00:19:50
Index 1/1, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
90
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
Monitoring and Capturing Packets
There may be times when it is necessary to analyze packets as they traverse the wire when you
are troubleshooting complex or obscure problems. While there are many products available that
can be used to view captured packets on the wire, the TSHOOT certification exam places emphasis only on understanding how to redirect this captured information from Cisco IOS routers and
switches to the appropriate application.
Cisco IOS software supports different packet capture mechanisms, depending on whether the device is a router or a switch. On Cisco IOS software-switching routers, such as the Cisco 1800, 2800,
and 3800 Series routers running IOS 12.4T or 15.x, the Route IP Traffic Export (RITE) tool allows
network administrators to configure the router to export IP packets received on multiple, simultaneous WAN or LAN interfaces to a single LAN or VLAN interface, to which a protocol analyzer
or monitoring application is connected. The Router IP Traffic Export feature can also allow you to
configure the router to capture IP packets in a buffer within the router, and then to dump the packets into a specified memory device.
When using RITE, you can configure the router to filter copied packets using either an ACL or
sampling. Sampling allows you to export only one in every few packets in which you are interested.
This option should be used if you do not need to capture all incoming traffic. This option can also
be used when a monitored ingress interface can send traffic faster than the egress interface can
transmit it. An example would be when capturing incoming traffic on a GigabitEthernet interface
and exporting it out of a FastEthernet interface.
When RITE is configured, by default only incoming (inbound) traffic is exported or captured. However, RITE can be configured to capture bidirectional (inbound and outbound) traffic. Router IP
Traffic Export is configured using IP traffic export profiles. Multiple profiles can be configured
on the same router. The following section lists and describes the sequence of configurations steps
required to configure the RITE feature in IOS software-based routers:
1. Configure a traffic export profile using the ip traffic-export profile <name> global
configuration command.
2. In RITE configuration mode, specify the interface on which the captured packets will be sent
using the interface <name> configuration command.
3. Next, specify the MAC address of the destination host that will be receiving the packet capture using the mac-address <address> RITE configuration command. Remember that the
router interface may be connected to a switch and reside in a VLAN with multiple hosts. If
91
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
the MAC address is not specified, then the profile will not recognize a destination host in
which to send the exported packets.
4. Finally, apply the IP traffic export profile to an ingress interface using the ip traffic-export apply <name> interface configuration command.
5. Begin the IP traffic capture using the traffic-export interface <name> start privileged EXEC command. You can also stop the traffic capture using the traffic-export interface <name> stop privileged EXEC command. Additional options that can be speci-
fied include the traffic-export interface <name> clear privileged EXEC command,
which clears the buffer, and the traffic-export interface <name> copy <destination> command, which can be used to copy the traffic capture to a TFTP server, Flash
memory, or an onboard USB device.
In addition to these required commands, additional optional commands can be specified when
configuring RITE as follows:
1. Optionally, you can configure the router to capture bidirectional packets using the bidirectional RITE configuration command.
2. Optionally, you can configure filtering for incoming traffic using the incoming [accesslist <standard | extended | named> | sample one-in-every <packet-number>]
RITE configuration command. Inbound filtering is enabled by default after you create the
RITE profile.
3. Finally, you can also optionally filter outgoing traffic using the outgoing [access-list
<standard | extended | named> | sample one-in-every <packet-number>] RITE
configuration command.
The following configuration example illustrates how to configure RITE on a router. The router will
be configured to send captured traffic to a device with MAC address 1234.abcd.5678 residing off
the GigabitEthernet0/1 interface. In addition, the router will also be configured to sample 1 in every
10 packets. The capture will be for inbound and outbound traffic. Finally, the profile will be applied
to the GigabitEthernet0/0 interface:
R4(config)#ip traffic-export profile TSHOOT
R4(conf-rite)#interface GigabitEthernet0/1
R4(conf-rite)#mac-address 1234.abcd.5678
R4(conf-rite)#incoming sample one-in-every 10
92
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
R4(conf-rite)#outgoing sample one-in-every 10
R4(conf-rite)#bidirectional
R4(conf-rite)#exit
R4(config)#interface GigabitEthernet 0/0
R4(config-if)#ip traffic-export apply TSHOOT
R4(config-if)#
*Oct 24 21:08:48.734: %RITE-5-ACTIVATE: Activated IP traffic export on interface
GigabitEthernet0/0
After enabling RITE on a particular interface, the router automatically generates the message that
can be seen above. When the profile is removed, the following is displayed:
R4(config)#int Gi0/0
R4(config-if)#no ip traffic-export apply TSHOOT
R4(config-if)#end
R4#
R4#
R4#
*Oct 24 21:09:38.542: %RITE-5-DEACTIVATE: Deactivated IP traffic export on
interface GigabitEthernet0/0
Following the RITE configuration, the show ip traffic export [interface <name>] command can be used to view or validate the RITE configuration parameters:
R4#show ip traffic-export interface GigabitEthernet0/0
Router IP Traffic Export Parameters
Monitored Interface
GigabitEthernet0/0
Export Interface
GigabitEthernet0/1
Destination MAC address 1234.abcd.5678
bi-directional traffic export is on
Output IP Traffic Export Information
Packets/Bytes Exported
Packets Dropped
72
Sampling Rate
one-in-every 10 packets
No Access List configured
Input IP Traffic Export Information
Packets/Bytes Exported
Packets Dropped
98
Sampling Rate
one-in-every 10 packets
No Access List configured
Profile TSHOOT is Active
7/684
10/964
Finally, begin the traffic capture using the traffic -capture interface <name> start command as follows:
R4#traffic-capture interface GigabitEthernet0/0 start
R4#
*Oct 24 21:40:29.662: %RITE-5-CAPTURE_START: Started IP traffic capture for
interface GigabitEthernet0/0
93
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
After you have completed the traffic capture, stop the capture using the traffic-capture interface <name> stop command as follows:
R4#traffic-capture interface GigabitEthernet0/0 stop
R4#
R4#
*Oct 24 21:45:52.878: %RITE-5-CAPTURE_STOP: Stopped IP traffic capture for
interface GigabitEthernet0/0
As was previously stated, the IP traffic export feature also provides the capability to capture IP
packets in local router memory, and then dump this data to a file on an external device, such as
Flash memory. The configuration of this capability follows the same basic steps that were used
when exporting the captured traffic to a device off the router interface, with some subtle differences. The router configuration steps for local capture are listed and described below:
1. Configure a traffic export profile using the ip traffic-export profile <name> mode
capture global configuration command.
2. Specify the length of the packet in capture mode using the length <size> RITE configuration command. Valid options are 128, 256, and 512 bytes.
3. Apply the traffic export profile to an interface using the ip traffic-export apply <name>
size <size> interface configuration command. The size <size> option specifies the size
of the buffer, in bytes.
4. Begin the IP traffic capture using the traffic-export interface <name> start privileged EXEC command. You can also stop the traffic capture using the traffic-export interface <name> stop privileged EXEC command. Additional options that can be speci-
fied include the traffic-export interface <name> clear privileged EXEC command,
which clears the buffer, and the traffic-export interface <name> copy <destination> command, which can be used to copy the traffic capture to a TFTP server, Flash
memory, or an onboard USB device.
As is the case with configuring a traffic capture that will be sent to a specified device, there are also
additional options when configuring capture mode (local buffer) traffic captures as follows:
1. Optionally, you can configure the router to capture bidirectional packets using the bidirectional RITE configuration command.
94
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
2. Optionally, you can configure filtering for incoming traffic using the incoming [accesslist <standard | extended | named> | sample one-in-every <packet-number>]
RITE configuration command. Inbound filtering is enabled by default after you create the
RITE profile.
3. Finally, you can also optionally filter outgoing traffic using the outgoing [access-list
<standard | extended | named> | sample one-in-every <packet-number>] RITE
configuration command.
The following example illustrates how to configure local IP packet capture on the router. The capture is configured to filter packets referencing an extended ACL and will be applied inbound on the
GigabitEthernet0/0 interface. The configuration specifies a packet length of 512 bytes and the local
buffer is configured with a size of 1024 bytes:
R4(config)#ip traffic-export profile TSHOOT mode capture
R4(conf-rite)#length 512
R4(conf-rite)#incoming access-list RITE-ACL
R4(conf-rite)#exit
R4(config)#interface GigabitEthernet0/0
R4(config-if)#ip traffic-export apply TSHOOT size 1024
R4(config-if)#exit
R4(config)#ip access-list extended RITE-ACL
R4(config-ext-nacl)#permit icmp any any
R4(config-ext-nacl)#end
Following the configuration, the show ip traffic export [interface] command can be used
to verify the local traffic capture configuration parameters as follows:
R4#show ip traffic-export GigabitEthernet0/0
Router IP Traffic Export Parameters
Monitored Interface: GigabitEthernet0/0
Limit capture length of packet to 512 bytes.
bi-directional traffic capture is off
Input IP Traffic Capture Information
Packets/Bytes Captured 0/0
Packets Dropped
502
Sampling Rate
one-in-every 1 packets
Access List
RITE-ACL [named extended IP]
IP Traffic Capture Buffer Information
Defined Buffer Size
1024 bytes
Capture Buffer Size
1024 bytes
Capture Buffer Used
24 bytes
Capture Buffer Free
1000 bytes
Profile TSHOOT capture state: Active
95
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
After the traffic capture has been configured, the traffic-capture interface <name> start
command should be used to begin the traffic capture as follows:
R4#traffic-capture interface GigabitEthernet0/0 start
R4#
*Oct 24 21:40:29.662: %RITE-5-CAPTURE_START: Started IP traffic capture for
interface GigabitEthernet0/0
After you have completed the traffic capture, stop the capture using the traffic-capture interface <name> stop command as follows:
R4#traffic-capture interface GigabitEthernet0/0 stop
R4#
R4#
*Oct 24 21:45:52.878: %RITE-5-CAPTURE_STOP: Stopped IP traffic capture for
interface GigabitEthernet0/0
Finally, the traffic-capture interface <name> copy command can be used to export the traffic capture information to an external location, such as a TFTP server, Flash memory, or even an
onboard USB device. The following example illustrates how to export the traffic capture to a TFTP
server with the IP address 150.1.1.254. The file will be saved on the TFTP server with the name ‘r4traffic-capture’:
R4#traffic-capture interface GigabitEthernet0/0 copy tftp:
Address or name of remote host []? 150.1.1.254
Capture buffer filename []? r4-traffic-capture
Copying capture buffer to tftp://150.1.1.254/r4-traffic-capture !!
Before implementing RITE in a production environment, keep in mind that a delay is incurred
on the outbound interface when packets are captured and transmitted across the interface. Performance delays increase with the increased number of interfaces that are monitored and the increased number of destination hosts. Finally, keep the following restrictions in mind when configuring or enabling the Router IP Traffic Export feature:
•
The MAC address of the device that is receiving the exported traffic must be on the same
VLAN or directly connected to one of the router interfaces. You can use the show arp or
show ip arp command to determine the MAC address of any device that is directly con-
nected to an interface.
•
The outgoing interface for exported traffic must be Ethernet (10/100/1000). However, the
incoming or monitored traffic can traverse any interface.
96
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
On Cisco IOS-distributed router platforms, such as the Cisco 7600 Series routers, as well as on
Cisco IOS Catalyst switches, the Switched Port Analyzer (SPAN) feature is used to capture packets
instead. There are three variations of SPAN, which include the local SPAN feature, Remote SPAN
(RSPAN), and Encapsulated RSPAN (ERSPAN).
The local SPAN feature, commonly referred to as SPAN, copies traffic from one or more CPUs, one
or more ports, one or more EtherChannels, or one or more VLANs, and sends the copied traffic to
one or more destinations for analysis by a network analyzer, such as a Switch Probe device or other
Remote Monitoring (RMON) probe. Traffic can also be sent to the processor for packet capture by
the Mini Protocol Analyzer.
While SPAN does not affect the switching of traffic on sources, it is important to remember that
the SPAN-generated copies of traffic compete with user traffic for switch resources. Local SPAN
sessions are comprised of an association of source ports and source VLANs with one or more destinations. Each local SPAN session can have either ports or VLANs as sources, but not both. Local
SPAN sessions are configured on a single switch. When configuring SPAN, the following restrictions apply when specifying ports as the source:
•
The port can be any port type, such as EtherChannel, FastEthernet, or GigabitEthernet
•
The same local port can be monitored in multiple SPAN sessions
•
The local SPAN source port cannot be configured as a destination port
•
Each source port can be configured with a direction (ingress, egress, or both) to monitor
•
Source ports can be in the same or different VLANs
When configuring a VLAN as the SPAN source, the following restrictions apply:
•
On a given port, only traffic on the monitored VLAN is sent to the destination port
•
All active ports in the source VLAN are included as source ports
•
Destination ports that belong to the source VLAN are excluded from the source list
•
Removed or added ports in a VLAN are removed or added to the session
•
You can monitor only Ethernet VLANs
•
You cannot use filter VLANs in the same local SPAN session with VLAN sources
Finally, the following restrictions apply to the SPAN destination ports:
•
The destination port must reside on the same physical single switch as the source port
•
The destination port can be any Ethernet physical port
•
The destination port can participate in only one SPAN session at a time
97
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
The destination port cannot be a source port
•
The destination port cannot be an EtherChannel group
•
If the destination port resides in an EtherChannel group, then it is removed from the group
•
The destination port will not transmit traffic unless learning is enabled
•
The destination port line protocol will show a state of up/down by design
•
If ingress forwarding is enabled, then the destination port forwards traffic at Layer 2
•
A destination port does not participate in Spanning Tree while the SPAN session is active
•
When it is a destination port, it does not participate in any of the Layer 2 protocols
•
If the port belongs to a source VLAN, then it is excluded from the source and is not monitored
In Cisco IOS Catalyst switches, local SPAN source is configured using the monitor session
<session_number> source [[single_interface | interface_list | interface_range |
mixed_interface_list | single_vlan | vlan_list | vlan_range | mixed_vlan_list]
[rx | tx | both]] global configuration command. Keep in mind that the options available will
vary depending on the switch platform.
The local SPAN destination is configured using the monitor session <session_number> destination [single_interface | interface_list | interface_range | mixed_interface_list] global configuration command.
The following configuration example illustrates how to configure local SPAN on the switch to copy
inbound and outbound traffic on FastEthernet0/1 and send this traffic to interface FastEthernet0/2.
It is assumed a monitoring device is connected to the FastEthernet0/2 interface:
Sw1#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
Sw1(config)#monitor session 1 source interface Fa0/1 both
Sw1(config)#monitor session 1 destination interface Fa0/2
Sw1(config)#end
Following this implementation, use the show monitor session [<session | all] detail
command to verify the local SPAN configuration:
Sw1#show monitor session 1
Session 1
--------Type
: Local Session
Source Ports
:
Both
: Fa0/1
Destination Ports : Fa0/2
Encapsulation: Native
Ingress: Disabled
98
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
The detail keyword can be appended to view detailed information as follows:
Sw1#show monitor session 1 detail
Session 1
--------Type
: Local Session
Source Ports
:
RX Only
: None
TX Only
: None
Both
: Fa0/1
Source VLANs
:
RX Only
: None
TX Only
: None
Both
: None
Source RSPAN VLAN : None
Destination Ports : Fa0/2
Encapsulation: Native
Ingress: Disabled
Reflector Port
: None
Filter VLANs
: None
Dest RSPAN VLAN
: None
Figure 2-4 below illustrates a sample packet capture, using Wireshark, based on the configuration
that was applied to the switch in the previous configuration example:
Fig. 2-4. Sample Packet Capture from a Local SPAN Session
99
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Unlike local SPAN, Remote SPAN (RSPAN) supports source ports and VLANs, as well as destinations on different switches, allowing you to perform remote monitoring of multiple switches across
your network. RSPAN does this by using a Layer 2 VLAN to carry SPAN traffic between switches.
RSPAN configuration is therefore comprised of an RSPAN source session, an RSPAN destination
session, and an RSPAN VLAN. RSPAN source and destination sessions can also be configured on
different switches.
An RSPAN source session can have either ports or VLANs as sources, but not both. The RSPAN
source session copies traffic from the source ports or source VLANs and switches the traffic over
the RSPAN VLAN to the RSPAN destination. The RSPAN destination session switches the traffic
to the destinations. In addition to source ports and VLANs, as well as destination ports, RSPAN
also introduces a new port type referred to as a reflector port. The reflector port is simply a port
that copies packets onto an RSPAN VLAN. All reflector ports have the following characteristics
and restrictions:
•
A reflector port is a port set to Loopback
•
A reflector port cannot be an EtherChannel group
•
A reflector port does not trunk
•
A reflector port cannot do protocol filtering
•
If a port assigned to an EtherChannel is specified, then it is removed from the EtherChannel
•
A reflector port cannot be a SPAN source or destination port
•
A reflector port cannot be a reflector port for more than one RSPAN session
•
A reflector port is invisible to all VLANs
•
The native VLAN for looped-back traffic on a reflector port is the RSPAN VLAN
•
The reflector port loops back untagged traffic to the switch
•
Spanning tree is automatically disabled on a reflector port
•
A reflector port receives copies of sent and received traffic for all monitored source ports
The configuration of RSPAN is performed in two steps. The first step entails the configuration of
the RSPAN VLAN using the remote-span VLAN configuration command. The following configuration example illustrates how to configure a VLAN as an RSPAN VLAN:
Sw1(config)#vlan 123
Sw1(config-vlan)#name RSPAN-VLAN
Sw1(config-vlan)#remote-span
Sw1(config-vlan)#exit
Sw1(config)#exit
Sw1#
This configuration can then be validated using the show vlan id command as follows:
100
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
Sw1#show vlan id 123
VLAN Name
Status
Ports
---- -------------------------------- --------- ---------------------------123 RSPAN-VLAN
active
VLAN Type SAID
MTU
Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2
---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ---123 enet 100123
1500 0
0
Remote SPAN VLAN
---------------Enabled
Primary Secondary Type
Ports
------- --------- ----------------- ----------------------------------------
The second step following the RSPAN VLAN configuration is to configure the RSPAN sessions.
This can be performed in SPAN configuration mode or in global configuration mode. The following
section describes how to configure a source RSPAN session in SPAN configuration mode:
1. Configure the RSPAN source session using the monitor session <session> type rspan-source global configuration command.
2. Associate the RSPAN source session number with the CPU, source ports, or VLANs, and
select the traffic direction to be monitored using the source [[cpu [rp | sp]] | <interface> | <interface list> | <interface range> | <vlan> | <vlan list> |
<vlan range>] [rx | tx | both] RSPAN source session configuration command.
3. Associate the RSPAN source session number with the RSPAN VLAN using the destination remote vlan <RSPAN-VLAN> RSPAN source session configuration command.
The following section describes how to configure the destination session in RSPAN destination sessions in SPAN configuration mode:
1. Configure the RSPAN destination session using the monitor session <session> type
rspan-destination global configuration command.
2. Associate the RSPAN destination session number with the RSPAN VLAN using the source
remote vlan <RSPAN-VLAN> RSPAN destination session configuration command.
3. Associate the RSPAN destination session number with the destinations using the destination <interface>| <interface list | <interface range> [ingress [learning]]
RSPAN destination session configuration command.
101
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The following configuration example illustrates how to configure an RSPAN session between two
switches. It should be assumed that these switches have a truck connection configured between
them. The RSPAN configuration will copy traffic from port GigabitEthernet1/1 on switch 1 to port
GigabitEthernet1/1 on switch 2. VLAN 123 will be used for RSPAN. The configuration on switch 1
(Sw1), which is the RSPAN source, is implemented as follows:
Sw1(config)#vtp domain H2N-TSHOOT
Sw1(config)#vtp mode transparent
Sw1(config)#vlan 123
Sw1(config-vlan)#name RSPAN-VLAN
Sw1(config-vlan)#remote-span
Sw1(config-vlan)#exit
Sw1(config)#exit
Sw1(config)#monitor session 1 type rspan-source
Sw1(config-mon-rspan-src)#source interface GigabitEthernet1/1
Sw1(config-mon-rspan-src)#destination remote vlan 2
Sw1(config-mon-rspan-src)#end
The configuration on switch 2 (Sw2), the RSPAN destination, is implemented as follows:
Sw2(config)#vtp domain H2N-TSHOOT
Sw2(config)#vtp mode transparent
Sw2(config)#vlan 123
Sw2(config-vlan)#name RSPAN-VLAN
Sw2(config-vlan)#remote-span
Sw2(config-vlan)#exit
Sw2(config)#exit
Sw2(config)#monitor session 1 type rspan-destination
Sw2(config-mon-rspan-dst)#source remote vlan 2
Sw2(config-mon-rspan-dst)#destination interface GigabitEthernet1/1
Sw2(config-mon-rspan-dst)#end
Following the RSPAN configuration, you can use the show monitor session <number> command to validate the configuration. Following is the output of this command on Sw1:
Sw1#show monitor session 1
Session 1
--------Type
: Remote Source Session
Source Ports
:
Both
: Gi1/1
Dest RSPAN VLAN
: 2
The output of the same command on Sw2 displays the following:
Sw2#show monitor session 1
Session 1
102
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
--------Type
: Remote Destination Session
Source RSPAN VLAN
: 2
Destination Ports
: Gi1/1
Encapsulation
: Native
Ingress : Disabled
As was previously stated, the second option is to configure RSPAN in global configuration mode
as follows:
1. Configure the RSPAN source session number with the source ports or VLANs, and select the
traffic direction to be monitored using the monitor session <session> source [<interface> | <interface list> |<interface range> | <vlan> | <vlan list>|
<vlan range>] [rx | tx | both] global configuration command.
2. Associate the RSPAN source session number with the RSPAN VLAN using the monitor
session <session> destination remote vlan <RSPAN-VLAN> global configuration
command.
The following section describes how to configure RSPAN destination sessions in global configuration mode:
1. Configure the RSPAN destination session number with the RSPAN VLAN using the monitor session <session> source remote vlan <RSPAN-VLAN> global configuration
command.
2. Associate the RSPAN destination session number with the RSPAN VLAN using the monitor session <session> destination [<interface> | <interface list> | <interface range> | [ingress [learning]]] global configuration command.
The following configuration illustrates how to configure RSPAN in global configuration mode using VLAN 123 as the RSPAN VLAN. This configuration assumes that a trunk has been configured
between the two switches. RSPAN is configured to copy traffic received from the GigabitEthernet1/1 port on switch 1, while switch 2 forwards this copied traffic to its local GigabitEthernet1/1
port. The configuration on switch 1 (Sw1), which is the RSPAN source, is implemented as follows:
Sw1(config)#vtp domain H2N-TSHOOT
Sw1(config)#vtp mode transparent
Sw1(config)#vlan 123
Sw1(config-vlan)#name RSPAN-VLAN
Sw1(config-vlan)#remote-span
Sw1(config-vlan)#exit
103
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Sw1(config)#exit
Sw1(config)#monitor session 1 source interface GigabitEthernet1/1
Sw1(config)#monitor session 1 destination remote vlan 2
Sw1(config)#end
The configuration on switch 2 (Sw2), the RSPAN destination, is implemented as follows:
Sw2(config)#vtp domain H2N-TSHOOT
Sw2(config)#vtp mode transparent
Sw2(config)#vlan 123
Sw2(config-vlan)#name RSPAN-VLAN
Sw2(config-vlan)#remote-span
Sw2(config-vlan)#exit
Sw2(config)#exit
Sw2(config)#monitor session 1 source remote vlan 2
Sw2(config)#monitor session 1 destination interface GigabitEthernet1/1
Sw2(config)#end
As stated in the previous configuration example, the show monitor session <number> command can be used to validate this configuration. Following is the output of this command on switch
1 (Sw1):
Sw1#show monitor session 1
Session 1
--------Type
: Remote Source Session
Source Ports
:
Both
: Gi1/1
Dest RSPAN VLAN
: 2
NOTE: When using the show monitor session <number> command, append the detail
keyword to this command to print detailed information as illustrated below:
Sw1# show monitor session 1 detail
Session 1
-----------Type : Remote Source Session
Source Ports:
RX Only: Gi1/1
TX Only: None
Both: None
Source VLANs:
RX Only: None
TX Only: None
Both: None
Source RSPAN VLAN: None
Destination Ports: None
Filter VLANs: None
Dest RSPAN VLAN: 2
104
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
The output of the show monitor session command on Sw2 displays the following:
Sw2#show monitor session 1
Session 1
--------Type
: Remote Destination Session
Source RSPAN VLAN
: 2
Destination Ports
: Gi1/1
Encapsulation
: Native
Ingress : Disabled
Finally, the last SPAN variant, Encapsulated RSPAN (ERSPAN), is somewhat similar to RSPAN in
that it supports source ports and VLANs, and destinations on different switches; however, unlike
RSPAN, that uses a Layer 2 VLAN for the SPAN traffic, ERSPAN uses a GRE tunnel to carry traffic
between switches. This means that ERSPAN can be configured across IP networks, thus providing
far greater monitoring capabilities on the network.
ERSPAN consists of an ERSPAN source session, routable ERSPAN GRE-encapsulated traffic, and
an ERSPAN destination session. In a manner similar to RSPAN, ERSPAN source and destination
sessions can be configured on different switches. Additionally, like RSPAN, each ERSPAN source
session can have either ports or VLANs as sources, but not both.
NOTE: The configuration of ERSPAN is beyond the scope of the TSHOOT certification exam
and will not be illustrated or described in any further detail in this guide.
Health Monitoring
Verifying the health of internetwork devices is a common troubleshooting task. Cisco IOS software
allows you to view memory and processor utilization statistics, as well as environmental variables,
such as the temperature of the device and the status of power supplies installed within the device.
The show processes command prints information about active processes running on the device
(i.e., router or switch). This command can be used to view detailed information about specific
processes, detailed CPU utilization statistics, and even detailed memory utilization statistics. The
valid keywords that can be used in conjunction with this command are listed in the output below:
R2#show processes
<1-4294967295>
cpu
history
memory
timercheck
|
<cr>
?
Process Number
Show CPU use per process
display ordered process history
Show memory use per process
Show processes configured for timercheck
Output modifiers
105
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
NOTE: These options will vary depending on the platform on which the command is issued.
The most commonly used keywords are cpu and memory. These keywords display detailed CPU utilization statistics as well as the amount of memory used, respectively. Following is a sample output
of the show processes cpu command:
R2#show processes cpu
CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0%
PID Runtime(ms)
Invoked
uSecs
5Sec
1Min
5Min TTY Process
1
0
3
0 0.00% 0.00% 0.00%
0 Chunk Manager
2
84
24809
3 0.00% 0.00% 0.00%
0 Load Meter
3
84
17
4941 0.00% 0.00% 0.00%
0 Exec
4
0
1
0 0.00% 0.00% 0.00%
0 EDDRI_MAIN
5
280786
18847
14898 0.00% 0.27% 0.23%
0 Check Heaps
6
24
13
1846 0.00% 0.00% 0.00%
0 Pool Manager
7
0
2
0 0.00% 0.00% 0.00%
0 Timers
8
0
1
0 0.00% 0.00% 0.00%
0 Crash Writer
...
[Truncated Output]
The show processes cpu command prints a great deal of information, as can be seen above.
When using this command, consider filtering the output using the applicable keywords to reduce
the amount of information that is returned. The following example illustrates how to filter the output of this command to include only BGP processes:
R2#show processes cpu | include BGP
120
52
7029
7
151
81
1646
49
218
1651
230
7178
0.00%
0.00%
0.00%
0.00%
0.00%
0.06%
0.00%
0.00%
0.00%
0 BGP Router
0 BGP I/O
0 BGP Scanner
The show processes cpu command can also be used to view the historical CPU utilization statistics for the device over a 60-second, a 60-minute, and a 72-hour interval by simply appending the
history command as illustrated in the following output:
R1#show processes cpu history
R1
05:04:38 AM Sunday Mar 3 2002 UTC
111111
444449999999999889999998800000099999999999899999998888877777
888889999997777449777774400000099999779999899999992222222222
100
********** ****** ***************** *******
90
********** ****** *************************
80
**************************************************
106
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
70
*******************************************************
60
*******************************************************
50 ************************************************************
40 ************************************************************
30 ************************************************************
20 ************************************************************
10 ************************************************************
0....5....1....1....2....2....3....3....4....4....5....5....6
0
5
0
5
0
5
0
5
0
5
0
CPU% per second (last 60 seconds)
1
08859
3
083896111112212111122411111212114121433721132211111122221121
100 *
*
90 #* *
80 ##* *
70 ##* *
60 ##***
50 ##***
40 ###*#
*
30 ###*#
*
20 #####
*
10 #####*
#
0....5....1....1....2....2....3....3....4....4....5....5....6
0
5
0
5
0
5
0
5
0
5
0
CPU% per minute (last 60 minutes)
* = maximum CPU%
# = average CPU%
38
70
100
90
80 *
70 *
60 *
50 *
40 **
30 **
20 **
10 **
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
0
5
0
5
0
5
0
5
0
5
0
5
0
CPU% per hour (last 72 hours)
* = maximum CPU%
# = average CPU%
The show processes memory command displays information about the active processes in the
router and the corresponding memory used. Following is a sample output of the show processes
memory command:
107
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R2#show processes memory
Processor Pool Total:
53977824 Used:
I/O Pool Total:
3729408 Used:
PID TTY
0
0
0
0
0
0
0
0
1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
Allocated
32967392
12052
29197288
0
15104
196
22236
65588
3296
210936
196
0
Freed
13607204
636100
27868336
0
0
196
22076
0
196
274452
196
0
22192752 Free:
2131648 Free:
Holding
15720348
12052
2512444
490880
22116
4012
13188
90600
10112
26188
7012
25012
Getbufs
0
0
380484
0
0
0
0
0
0
199980
0
0
31785072
1597760
Retbufs
0
0
380484
0
0
0
0
0
0
233400
0
0
Process
*Init*
*Sched*
*Dead*
*MallocLite*
Chunk Manager
Load Meter
Exec
EDDRI_MAIN
Check Heaps
Pool Manager
Timers
Crash Writer
...
[Truncated Output]
As is applicable with the show processes cpu command, the show processes memory command prints a great deal of information. Again, keep in mind that the filtering options that are
applicable with other commands can also be used with this command to reduce the amount of
information that is printed by this command. The following illustrates how to use Cisco IOS filters
to include only memory utilization information for different BGP processes:
R2#show processes memory | section BGP
120
0
45584
0
10096
151
0
0
0
7036
218
0
0
0
10048
0
0
0
0 BGP Router
0 BGP I/O
0 BGP Scanner
In addition to the show processes memory command, you can also use the show memory command to display summary information about processor and I/O memory, followed by a more comprehensive report of memory utilization. The show memory command supports the following keywords or options:
R2#show memory ?
allocating-process
dead
debug
failures
fast
fragment
free
io
multibus
Show allocating process name
Memory owned by dead processes
Memory debugging commands
Memory failures
Fast memory stats
Summary of memory fragment information
Free memory stats
IO memory stats
Multibus memory stats
108
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
overflow
pci
processor
statistics
summary
transient
memory overflow corrections
PCI memory stats
Processor memory stats
Mempool Statistics
Summary of memory usage per alloc PC
Transient memory stats
While delving into the specifics on all applicable or valid keywords is beyond the scope of the
TSHOOT certification exam, the show memory allocating-process totals or show memory
summary commands are two of the most commonly used options when troubleshooting memory
allocation issues. The output printed by these commands can be decoded using the Cisco Output
Interpreter, which is available online, and is also commonly requested by the Technical Assistance
Center (TAC) for troubleshooting allocation issues.
NOTE: As is applicable with other Cisco IOS show commands, you can filter the output that is
printed by this command or redirect it to an external location for further analysis.
Finally, the last health monitoring and troubleshooting command that we will discuss in this section is the show environment command. This command provides information on the status of
your router fans, power supply, system board temperature, and more. However, it should be noted
that this command is supported in Cisco 2600XM and ISR-Series routers, as well as Catalyst 3750,
4500, and 6500 Series switches.
Once per minute, a routine is run that receives environmental measurements from sensors and
stores the output into a buffer. This buffer is displayed on the console when the show environment
command is entered. In the event that a measurement exceeds desired margins, but has not exceeded fatal margins, a warning message is printed on the system console. Cisco IOS software queries the sensors for measurements once per minute, but warnings for a given test point are printed,
at most, once every hour for sensor readings in the warning range, and once every five minutes for
sensor readings in the critical range. If a measurement is out of line within these time segments,
then an automatic warning message appears on the console. If a shutdown occurs because of detection of fatal environmental margins, the last measured value from each sensor is stored in internal
nonvolatile memory.
You can also enable SNMP notifications (traps or informs) to alert an NMS when environmental
thresholds are reached using the snmp-server enable traps envmon global configuration command in conjunction with the other SNMP configuration commands. Whenever Cisco IOS software detects a failure or recovery event, it sends an SNMP trap to the configured SNMP server. Unlike console messages, only one SNMP trap is sent when the failure event is first detected. Another
109
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
trap is sent when the recovery is detected. The following displays the options that are available with
the show environment command on a Catalyst 6500 Series switch. Keep in mind that these options will vary depending on the platform:
Catalyst-6500-Core-Switch#show environment ?
alarm
show environmental alarms
connector
show connector parameters
cooling
show cooling parameters
status
operational status of FRU
temperature temperature readings
voltage
voltage readings
|
Output modifiers
<cr>
On a Cisco 2800 Series router, the following options can be used with this command:
R4#show environment ?
all
All environmental monitor parameters
last
Last environmental monitor parameters
table Temperature and voltage ranges
|
Output modifiers
<cr>
Following is the output of the show environment command for information on one specific module on a Catalyst 6500 Series switch. The information printed by this command will vary depending
on the chassis and modules:
Catalyst-6500-Core-Switch#show environment status module 5
module 5:
module 5 power-output-fail: OK
module 5 outlet temperature: 30C
module 5 inlet temperature: 25C
module 5 device-1 temperature: 37C
module 5 device-2 temperature: 36C
module 5 asic-1 (SSO-1) temp: 26C
module 5 asic-2 (SSO-2) temp: 26C
module 5 asic-3 (SSO-3) temp: 25C
module 5 asic-4 (SSO-4) temp: 26C
module 5 asic-5 (SSA-1) temp: 26C
module 5 asic-6 (HYPERION-1) temp: 26C
Following is the output of the show environment command on a Cisco 2800 Series router:
R4#show environment
Main Power Supply is AC
Fan 1 OK
Fan 2 OK
110
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
Fan 3 OK
Fan Speed Setting: Normal
System Temperature: 26 Celsius (normal)
Environmental information last updated 00:00:29 ago
ADDITIONAL TROUBLESHOOTING TOOLS
In addition to the CLI toolkit described in the previous section, Cisco also provides GUI-based
tools and applications that can be used for troubleshooting purposes. These include tools such as
SDM, CCP, and CiscoWorks LMS. Because these tools and applications are described in the previous chapter, they are not described again in this chapter. Please refer to the previous chapter for
additional information on these tools.
For registered Cisco Connection Online (CCO) users, Cisco also provides additional online tools
that can be used to assist with the troubleshooting process. One such tool, the Error Message Decoder, can assist a troubleshooter research and resolve error messages in Cisco IOS software, as
well as other Cisco software variants. For example, assume you consistently saw the following error
message on the console or in the logs of a particular router: %MARS_NETCLK-3-HOLDOVER_TRANS.
Using the Error Message Decoder, you would simply paste this message into the tool. The tool
would then print information describing what the error message means, as well as provide a recommended action to resolve the issue that is causing this message.
Another commonly used online troubleshooting tool is the Output Interpreter. This tool can be
used to identify problems by analyzing the output of supported show commands. The troubleshooter can simply paste the output of a supported show command into the tool, which then decodes the output and provides recommendations for resolving identified issues. While these two
tools are commonly used and are popular, they are only a small subset of what Cisco has online. The
complete list of tools can be found at the following link:
http://www.cisco.com/en/US/customer/support/tsd_most_requested_tools.html
NOTE: A valid CCO account is required to access the tools and resources available online.
111
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Troubleshoo ng and the Troubleshoo ng Flow
•
Troubleshooting can be thought of as the process of identifying or diagnosing a problem
•
There is no single troubleshooting method that can be applied to all situations
•
The high-level three-step troubleshooting flow is comprised of the following phases:
1. The Problem Report
2. Problem Diagnosis
3. Problem Resolution
•
The problem report is what typically initiates the troubleshooting process
•
The problem report is used to define the problem
•
The very first step of the troubleshooting process itself is problem diagnosis
•
The problem diagnosis phase is the most time consuming phase
•
The five steps included in the problem diagnosis phase are as follows:
1. Collecting information about the problem
2. Analyzing or examining the collected information
3. Eliminating possible causes
4. Hypothesizing or theorizing potential causes
5. Verifying the hypothesis or theory
•
The problem resolution phase entails notifying and confirming that the problem is resolved
Communica on and Troubleshoo ng
•
Effective communication is an integral component of the troubleshooting process
•
Effective communication includes the end user, team management, and management
•
Effective communication is essential in all steps of the troubleshooting process
Integra ng Maintenance and Troubleshoo ng
•
A well documented and well maintained network is a lot easier to support and troubleshoot
•
A structured maintenance approach facilitates the troubleshooting and support functions
•
Baselining is a network maintenance function that facilitates troubleshooting
•
Baselining is a process for studying the network and network devices regular intervals
•
Baselining can help you obtain the following information from the network:
1. Information on the health of the hardware and software
2. Predict future problems
3. Determine the current utilization of network resources
112
C H A P T E R 2: T RO U B L ES H O OT I N G M E T H O D O LO G I ES A N D TO O L S
4. Identify current network problems
5. Make accurate decisions about network alarm thresholds
•
The baselining process can be used to determine the network break point
•
Baselining tools include Cisco IP SLA operations, SNMP, NetFlow, and CiscoWorks
•
Documentation is integral to the troubleshooting process
•
The network should be well documented and documentation well maintained
•
Tools that facilitate documentation include EEM, Configuration Archive, and KRON
•
Change management also facilitates the troubleshooting process
•
A change management process can help minimize network and service downtime
•
Some examples of changes include the following:
1. Environmental changes
2. Network changes
3. Application changes
4. Hardware changes
5. Documentation changes
6. Software changes
•
Troubleshooting is simplified when changes are implemented in a controlled environment
Troubleshoo ng Methodologies
•
A structured troubleshooting approach reduces the amount of time spent troubleshooting
•
A structured troubleshooting approach results in greater efficiency
•
Experienced troubleshooters commonly use a shoot from the hip troubleshooting method
•
A shoot from the hip approach leverages the troubleshooters experience and knowledge
•
A shoot from the hip approach will not usually work well for inexperienced troubleshooters
•
Commonly used structured troubleshooting approaches include the following:
1. The Top-Down Troubleshooting Method
2. The Bottom-Up Troubleshooting Method
3. The Follow the Traffic Path Method
4. The Compare Configurations Method
5. The Divide and Conquer Method
6. The Component Swapping Method
•
The top-down approach begins troubleshooting at Layer 7 and works down to Layer 1
•
The bottom-up approach begins troubleshooting at Layer 1 and works up to Layer 7
•
The follow the traffic method requires intimate network and traffic flow knowledge
•
The follow the traffic method is based on the path that traffic takes through the network
113
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
The compare configurations method compares configurations with working ones
•
The divide and conquer method begins at the middle of the OSI model
•
The divide and conquer method then works up or down depending on the test results
•
The component swapping method physically replaces components
The Cisco IOS Generic Troubleshoo ng Toolkit
•
Cisco IOS software provides a plethora of tools that can be used to support troubleshooting
•
The ping and traceroute utilities are commonly used to verify network connectivity
•
The ping utility is primarily used to verify connectivity between endpoints
•
The traceroute utility is primarily used to discover the path taken between endpoints
•
Cisco IOS software supports standard and extended ping and traceroute functions
•
Cisco IOS software supports GOLD and IOS Service Diagnostics utilities
•
Cisco GOLD can be used to troubleshoot hardware problems
•
Cisco IOS Service Diagnostics is a programmable diagnostics service
•
Cisco IOS Service Diagnostics can be used for BGP,OSPF, and QoS diagnostics
•
Cisco IOS Service Diagnostics can be used to monitor and detect abnormal utilization
•
Cisco IOS Service Diagnostics leverages EEM and Tcl
•
Cisco commands are filtered using the begin, exclude, include, and section keywords
•
Regular expressions can also be used when performing command filtering
•
Cisco IOS command output can also be redirected to external locations, e.g. Flash and TFTP
•
Cisco IOS command output redirection uses the append, redirect or tee keywords
•
The append keyword appends output to an existing file
•
The redirect keyword redirects the command output to the specified location
•
The tee keyword redirects output and allows you to see it at the same time
•
Analyzing traffic packet captures is one of the most common troubleshooting tasks
•
Cisco IOS software-based routers support RITE for packet captures
•
RITE can be used to send captures to a specified device or store them in memory
•
Cisco Catalyst switches and high-end routers support SPAN for packet captures
•
Cisco provides three variants of SPAN: local SPAN, RSPAN and ERSPAN
•
Local SPAN is configured on a single physical device
•
Remote SPAN can be configured between multiple Layer 2 switches across trunk links
•
ERSPAN can be configured between remote switches separated by IP networks
•
The show processes command can be used for health monitoring and verification
•
The show processes command provides CPU and memory statistics
•
The show environment command is used to verify device environmental variables
•
Cisco provides additional troubleshooting online, e.g. Error Message Decoder
114
CHAPTER 3
Troubleshoo ng Switches
at Layers 1 and 2
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
I
n the first two chapters of this guide, we discussed network monitoring and maintenance functions, as well as troubleshooting methodologies and processes from a theoretical standpoint. The
following chapters in this guide will discuss troubleshooting from a practical perspective. This
chapter focuses on troubleshooting Cisco Catalyst switch Layer 1 and Layer 2 issues. The TSHOOT
certification exam objectives that are covered in this chapter include the following:
•
Troubleshoot switch-to-switch connectivity for the VLAN-based solution
•
Troubleshoot loop prevention for the VLAN-based solution
LAN switching is a form of packet switching that is used in Local Area Networks (LANs). LAN
switching is performed in hardware at the Data Link layer. Because LAN switching is hardwarebased, it uses hardware addresses that are referred to as Media Access Control (MAC) addresses.
The MAC addresses are then used by LAN switches to forward frames. This chapter will be divided
into the following sections:
•
Troubleshooting at the Physical Layer
•
VLAN,VTP, and Trunking Overview
•
Troubleshooting VLANs
•
Using the ‘show vlan’ Command
•
Spanning Tree Protocol Overview
•
Troubleshooting Spanning Tree Protocol
•
Using the ‘show spanning-tree’ Command
TROUBLESHOOTING AT THE PHYSICAL LAYER
Cisco IOS switches support several commands that can be used to troubleshoot Layer 1, or at least
suspected Layer 1, issues. However, in addition to being familiar with the software command suite,
it is also important to have a solid understanding of physical indicators (i.e., LEDs) that can be used
to troubleshoot link status or that indicate an error condition.
Troubleshoo ng Link Status Using Light Emi ng Diodes (LEDs)
If you have physical access to the switch or switches, LEDs can be a useful troubleshooting tool.
Different Cisco Catalyst switches provide different LED capabilities. Understanding the meaning
of the LEDs is an integral part of Catalyst switch link status and system troubleshooting. Cisco
Catalyst switches have front-panel LEDs that can be used to determine link status, as well as other
variables such as system status. The supported LEDs, as well as what they indicate, are listed and
described in Table 3-1 below:
116
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Table 3-1. Cisco Catalyst 6500 Ethernet Module LED Status
LED
Color or State
Description
Status
Green
The color green indicates that all diagnostics have passed and that
the module is operational.
The color orange indicates that the module is booting or running
diagnostics. This color could also indicate that an over temperature
condition (i.e., a minor temperature threshold) has been exceeded
during environmental monitoring.
The color red indicates that the module is resetting. The switch
has just been powered on or the module has been hot inserted
during the normal initialization sequence. The color red could
also indicate that an over temperature condition (i.e., a major
temperature threshold) has been exceeded during environmental
monitoring. If the module fails to download code and
configuration information successfully during the initial reset, the
LED stays red; the module does not come online.
If the status LED is off, this indicates that the module is not
receiving power or has been powered off.
Orange
Red
Off
LINK
Green
Orange
Flashing
Orange
Off
PHONE
Green
Off
The color green indicates that the port is active and the link is
connected and operational.
The color orange indicates that the port is disabled through the
CLI command (i.e., the shutdown command is configured).
A flashing orange LED indicates that the port is faulty and has
been disabled by the system.
If the link LED is off, then this indicates that the port is not
active or the link is not connected. It does not mean no cable is
connected.
If the phone LED is green, then this indicates that the voice
daughter card is installed.
If the phone LED is off, then this indicates that the voice daughter
card is not detected or is not installed.
Another popular Catalyst switch, the Catalyst 4500 Series modular switch, also has LEDs that can
be used for link and system troubleshooting. The Ethernet module LEDs, as well as the meanings
of these LEDs, are listed and described in Table 3-2 below:
117
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Table 3-2. Cisco Catalyst 4500 Ethernet Module LED Status
LED
Color or State
Status
Green
Red
Orange
Link
Green
Orange
Flashing
Orange
Off
Port
Status
Green
Orange
Flashing
Orange
Off
Description
The color green indicates that all diagnostic tests have passed.
The color red indicates that a test, other than an individual port
test, has failed.
The color orange indicates system boot-up, that self-test
diagnostics are running, or that the module is disabled.
The color green indicates that the port is operational and that a
signal has been detected.
The color orange indicates that the link has been disabled by
software (i.e., the shutdown command has been issued).
A flashing orange LED indicates that the link has been disabled
due to a hardware failure.
If the link LED is off, then this means no signal has been detected.
It does not necessarily mean that the port is not connected.
The color green indicates that the port is operational and that a
signal has been detected.
The color orange indicates that the link or port has been disabled
by software.
A flashing orange LED indicates that the link has been disabled
due to a hardware failure.
If the LED is off, then this indicates that no signal is detected.
In addition to understanding what the different LED colors mean, it is also important to have an understanding of what action to take to remedy the issue. For example, assume that you are troubleshooting a Catalyst 6500 Series switch and you notice that the status LEDs on the supervisor engine
(or any switching modules) is red or off. In such cases, it is possible that the module might have
shifted out of its slot, or, in the event of a new module, was not correctly inserted into the chassis. In
this case, the recommended action is to reseat the module. In some cases, it also may be necessary
to reboot the entire system.
While a link or port LED color other than green typically indicates some kind of failure or other
issue, it is important to remember that a green link light does not always mean that the cable is
fully functional. For example, a single broken wire or one shutdown port can cause the problem of
one side showing a green link light while the other side does not. This could be because the cable
encountered physical stress that causes it to be functional at a marginal level. In such cases, the CLI
can be used to perform additional troubleshooting.
118
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Using the Command Line Interface to Troubleshoot Link Issues
Several Command Line Interface (CLI) commands can be used to troubleshoot Layer 1 issues on
Cisco IOS Catalyst switches. Commonly used commands include the show interfaces, the show
controllers, and the show interface [name] counters errors commands. In addition to
knowing these commands, you also are required to be able to interpret accurately the output or
information that these commands provide.
The show interfaces command is a powerful troubleshooting tool that provides a plethora of
information, which includes the following:
•
The administrative status of a switching port
•
The port operational state
•
The media type (for select switches and ports)
•
Port input and output packets
•
Port buffer failures and port errors
•
Port input and output errors
•
Port input and output queue drops
Following is the output of the show interfaces command for a GigabitEthernet switch port:
Catalyst-3750-1#show interfaces GigabitEthernet3/0/1
GigabitEthernet0/1 is up, line protocol is down (notconnect)
Hardware is GigabitEthernet, address is 000f.2303.2db1 (bia 000f.2303.2db1)
MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, Loopback not set
Keepalive not set
Auto-duplex, Auto-speed, link type is auto, media type is unknown
input flow-control is off, output flow-control is desired
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output never, output hang never
Last clearing of “show interface” counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 no buffer
Received 0 broadcasts (0 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
0 packets output, 0 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
119
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
Most Cisco Catalyst switch ports default to the ‘notconnect’ state as illustrated in the first line of
the output printed by this command. However, a port can also transition to this state if a cable is removed from the port or is not correctly connected. This status is also reflected when the connected
cable is faulty or when the other end of the cable is not connected to an active port or device (e.g.,
if a workstation connected to the switch port is powered off ).
NOTE: When troubleshooting GigabitEthernet ports, this port status may also be a result of
incorrect gigabit interface converters (GBICs) being used between the two ends.
The first part of the output in the first line printed by this command (i.e., [interface] is up)
refers to the Physical layer status of the particular interface. The second part of the output (i.e.,
line protocol is down) indicates the Data Link layer status of the interface. If this indicates an
‘up’, then it means that the interface can send and receive keepalives. Keep in mind that it is possible for the switch port to indicate that the Physical layer is up, while the Data Link layer is down,
for example, such as when the port is a SPAN destination port or if the local port is connected to a
CatOS switch with its port disabled.
The Input queue indicates the actual number of frames dropped because the maximum queue size
was exceeded. The flushes column counts Selective Packet Discard (SPD) drops on the Catalyst
6000 Series switches. SPD drops low-priority packets when the CPU is overloaded in order to save
some processing capacity for high-priority packets. The flushes counter in the show interfaces
command output increments as part of SPD, which implements a selective packet drop policy on
the IP process queue of the router. Therefore, it applies only to process-switched traffic. The different Cisco IOS switching methods will be described again later in this guide.
The Total output drops indicates the number of packets dropped because the output queue is
full. This is often seen when traffic from multiple inbound high bandwidth links (e.g., GigabitEthernet links) is being switched to a single outbound lower bandwidth link (e.g., a FastEthernet link).
The output drops increment because the interface is overwhelmed by the excess traffic due to the
speed mismatch between the inbound and outbound bandwidths.
In addition to the show interfaces command, the show interfaces [name] counters errors
command can also be used to view interface errors and facilitate Layer 1 troubleshooting. Following is the output that is printed by the show interfaces [name] counters errors command:
120
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Catalyst-3750-1#show interfaces GigabitEthernet3/0/1 counters errors
Port
Gi3/0/1
Port
Gi3/0/1
Align-Err
0
FCS-Err
0
Single-Col Multi-Col
0
0
Xmit-Err
0
Rcv-Err UnderSize
0
0
Late-Col Excess-Col Carri-Sen
0
0
0
Runts
0
Giants
0
The following section describes some of the error fields included in the output of the show interfaces [name] counters errors command and which issues or problems are indicated by
non-zero values under these fields.
The Align-Err field reflects a count of the number of frames received that do not end with an even
number of octets and that have a bad cyclic redundancy check (CRC). These errors are usually the
result of a duplex mismatch or a physical problem, such as cabling, a bad port, or a bad network
interface controller (NIC). When the cable is first connected to the port, some of these errors can
occur. In addition, if there is a hub connected to the port, collisions between other devices on the
hub can cause these errors.
The FCS-Err field reflects the number of valid-size frames with Frame Check Sequence (FCS) errors but no framing errors. This is typically a physical issue, such as cabling, a bad port, or a bad
NIC. Additionally, a non-zero value under this field could indicate a duplex mismatch.
A non-zero value in the Xmit-Err field is an indication that the internal send (Tx) buffer is full. This
is commonly seen when traffic from multiple inbound high bandwidth links (e.g., GigabitEthernet
links) is being switched to a single outbound lower bandwidth link (i.e., a FastEthernet link), for
example.
The Rcv-Err field indicates the sum of all receive errors. This counter is incremented when the
interface receives an error such as a runt, a giant, or an FCS, for example.
The Undersize field is incremented when the switch receives frames that are smaller than 64 bytes
in length. This is commonly caused by a faulty sending device.
The various collisions fields indicate collisions on the interface. This is common for half-duplex
Ethernet, which is almost non-existent in modern networks. However, these counters should not
increment for full-duplex links. In the event that non-zero values are present under these counters,
this typically indicates a duplex mismatch issue. When a duplex mismatch is detected, the switch
prints a message similar to the following on the console or in the log:
121
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
%CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on FastEthernet0/1 (not
full duplex), with R2 FastEthernet0/0 (full duplex)
As will be described in the section pertaining to Spanning Tree Protocol (STP), duplex mismatches
can cause STP loops in the switched network if a port is connected to another switch. These mismatches can be resolved by manually configuring the speed and the duplex of switch ports.
The Carri-Sen (carrier sense) counter increments every time an Ethernet controller wants to send
data on a half-duplex connection. The controller senses the wire and ensures that it is not busy
before transmitting. A non-zero value under this field indicates that the interface is operating in
half-duplex mode. This is normal for half-duplex.
Non-zero values can also be seen under the Runts field due to a duplex mismatch or because of
other Physical layer problems, such as a bad cable, port, or NIC on the attached device. Runts are
received frames with a bad CRC that are smaller than the minimum IEEE 802.3 frame size, which
is 64 bytes for Ethernet.
Finally, the Giants counter is incremented when frames are received that exceed the IEEE 802.3
maximum frame size, which is 1518 bytes for non-jumbo Ethernet, and that have a bad FCS. For
ports or interfaces connected to a workstation, a non-zero value in this counter is typically caused
by a bad NIC on the connected device. However, for ports or interfaces that are connected to another switch (e.g., via a trunk link), this field will contain a non-zero value if 802.1Q encapsulation is
used. With 802.1Q, the tagging mechanism implies a modification of the frame because the trunking device inserts a 4-byte tag and then re-computes the FCS.
Inserting a 4-byte tag into a frame that already has the maximum Ethernet size creates a 1522-byte
frame that can be considered a baby giant frame by the receiving equipment. Therefore, while the
switch will still process such frames, this counter will increment and contain a non-zero value. To
resolve this issue, the 802.3 committee created a subgroup called 802.3ac to extend the maximum
Ethernet size to 1522 bytes; however, it is not uncommon to see a non-zero value under this field
when using 802.1Q trunking.
Finally, the show controllers ethernet-controller <interface> command can also be used
to display traffic counter and error counter information similar to that printed by the show interfaces and show interfaces <name> counters errors commands. Following is the output of
the show controllers ethernet-controller <interface> command:
122
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Catalyst-3750-1#show controllers ethernet-controller GigabitEthernet3/0/1
Transmit GigabitEthernet3/0/1
4069327795 Bytes
559424024 Unicast frames
27784795 Multicast frames
7281524 Broadcast frames
0 Too old frames
0 Deferred frames
0 MTU exceeded frames
0 1 collision frames
0 2 collision frames
0 3 collision frames
0 4 collision frames
0 5 collision frames
0 6 collision frames
0 7 collision frames
0 8 collision frames
0 9 collision frames
0 10 collision frames
0 11 collision frames
0 12 collision frames
0 13 collision frames
0 14 collision frames
0 15 collision frames
0 Excessive collisions
0 Late collisions
0 VLAN discard frames
0 Excess defer frames
264522 64 byte frames
99898057 127 byte frames
76457337 255 byte frames
4927192 511 byte frames
21176897 1023 byte frames
127643707 1518 byte frames
264122631 Too large frames
0 Good (1 coll) frames
0 Good (>1 coll) frames
Receive
3301740741 Bytes
376047608 Unicast frames
1141946 Multicast frames
1281591 Broadcast frames
429934641 Unicast bytes
226764843 Multicast bytes
137921433 Broadcast bytes
0 Alignment errors
0 FCS errors
0 Oversize frames
0 Undersize frames
0 Collision fragments
257477
259422986
51377167
41117556
2342527
5843545
0
0
Minimum size frames
65 to 127 byte frames
128 to 255 byte frames
256 to 511 byte frames
512 to 1023 byte frames
1024 to 1518 byte frames
Overrun frames
Pause frames
0
0
18109887
0
0
Symbol error frames
Invalid frames, too large
Valid frames, too large
Invalid frames, too small
Valid frames, too small
0
0
0
0
Too old frames
Valid oversize frames
System FCS error frames
RxPortFifoFull drop frame
NOTE: The output above will vary slightly depending on the switch platform on which this
command is executed. For example, Catalyst 3550 Series switches also include a Discarded
frames field, which shows the total number of frames whose transmission attempt is abandoned due to insufficient resources. A large number in this field typically indicates a network
congestion issue. In the output above, you would look at the RxPortFifoFull drop frame
field, which indicates the total number of frames received on an interface that are dropped
because the ingress queue is full.
123
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
VLAN, VTP, AND TRUNKING OVERVIEW
A virtual LAN (VLAN) is a logical grouping of hosts that appear to be on the same LAN, regardless
of their physical location. VLANs increase the number of Broadcast domains in a switched network, while reducing their overall size. A VLAN can span a single switch or even multiple switches,
depending on the implementation. Troubleshooting intra-VLAN and inter-VLAN connectivity issues is therefore an all-encompassing task that should take several elements into consideration.
These elements are described later in the ‘Troubleshooting VLANs’ section. Catalyst switches support two types of switch VLAN ports, which are access ports and trunk ports. These port types are
described in the following section.
Access ports are switch ports that are assigned to a single VLAN. These ports can belong to only
a single VLAN. Switch access ports are typically used to connect network hosts, such as printers,
computers, IP phones, and wireless access points, to the LAN switch. However, access ports can
also be used to provide connectivity between users connected across multiple switches. Such a topology or implementation is illustrated in the diagram shown in Figure 3-1 below:
Fig. 3-1. Implementing LANs Using Access Ports and a Single VLAN
While such an implementation will work, assuming all the users are on the same subnet, you should
keep in mind that the entire switched LAN becomes a single Broadcast domain. The same situation would be applicable even if multiple VLANs were used, as illustrated in the network diagram
shown in Figure 3-2 below:
Fig. 3-2. Implementing LANs Using Access Ports and Multiple VLANs
124
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
While this configuration and implementation is valid, keep in mind that such a solution is not scalable, especially in larger networks with multiple subnets and devices. In addition, such implementations make troubleshooting end-to-end VLAN connectivity issues very difficult to isolate and
identify. Large networks should use multiple VLANs, reducing the size of the Broadcast domains
in the switched network and limiting the range of the VLANs.
VLAN trunks are used to carry data from multiple VLANs. In order to differentiate frames from
one VLAN with those from another, all frames sent across a trunk link are specially tagged so that
the destination switch knows to which VLAN the frame belongs. This allows for end-to-end VLAN
connectivity across multiple switches but requires either a router or other inter-VLAN solution to
facilitate communication between the VLANs. The following two methods can be used to ensure
that VLANs that traverse switch trunk links can be uniquely identified:
•
•
Inter-Switch Link
IEEE 802.1Q
Inter-Switch Link (ISL) is a Cisco proprietary protocol that is used to preserve the source VLAN
identification information for frames that traverse trunk links. Although ISL is a Cisco proprietary
protocol, it is not supported on all Cisco platforms. For example, Catalyst 2940 and 2950 Series
switches only support 802.1Q trunking; they do not support ISL trunking.
802.1Q is an IEEE standard for VLAN tagging. Unlike ISL, 802.1Q (or, more commonly, dot1q) inserts a single 4-byte tag into the original frame between the Source Address (SA) field and the Type
or Length fields, depending on the Ethernet frame type. For this reason, 802.1Q is also referred to
as a one-level, internal tagging or single tagging mechanism. Given that the length of the 802.1Q
tag is 4 bytes, the resulting Ethernet frame can be as large as 1522 bytes, while the minimum size of
the Ethernet frame with 802.1Q tagging is 68 bytes.
The VLAN Trunking Protocol (VTP) is a Cisco proprietary Layer 2 messaging protocol that manages the addition, deletion, and renaming of VLANs on a network-wide scale. VTP allows VLAN
information to propagate through the switched network, which reduces administration overhead
in a switched network, while enabling switches to exchange and maintain consistent VLAN information. In order to do this, switches must reside within the same VTP domain.
A VTP domain consists of adjacent connected switches that are part of the same management
domain. A switch can belong to only one VTP domain at any one time and will reject or drop any
VTP packets received from switches in any other VTP domains. In order to participate in the VTP
domain, switches must be configured for a specific VTP mode, each with its own characteristics. A
switch can be configured in one of three VTP modes: server, client, and transparent.
125
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
VTP server mode is the default VTP mode for all Cisco Catalyst switches. VTP server switches
control VLAN creation, modification, and deletion for their respective VTP domain. VTP clients
advertise and receive VTP information; however, they do not allow VLAN creation, modification,
or deletion. While a switch that is configured for VTP transparent allows for the creation, modification, and deletion of VLANs in the same manner as on a VTP server switch, it is different in that it
ignores VTP updates, and all VLANs that are created on the switch are locally significant and are
not propagated to other switches in the VTP domain.
TROUBLESHOOTING VLANS
In the previous section, we discussed the use of three CLI commands that can be used for the
troubleshooting of Physical Layer issues. This section describes some common approaches to identifying and troubleshooting intra-VLAN connectivity issues. Some of the more common causes of
intra-VLAN connectivity issues include the following:
•
Duplex mismatches
•
Bad NIC or cable
•
Congestion
•
Hardware issues
•
Software issues
•
Resource oversubscription
•
Configuration issues
Duplex mismatches can result in very slow network performance and connectivity. While some
improvements in auto negotiation have been made, and the use of auto negotiation is considered a
valid practice, it is still possible for duplex mismatches to occur. As an example, when the NIC is set
to 100/Full and the switch port is auto negotiating, the NIC will retain its 100/Full setting, but the
switch port will be set to 100/Half. Another example would be the inverse; that is, the NIC is set to
auto negotiate, while the switch port is set to 100/Full. In that case, the NIC would auto negotiate to
100/Half, while the switch retained its static 100/Full configuration, resulting in a duplex mismatch.
It is therefore good practice to specify manually speed and duplex settings for 10/100 Ethernet connections, where feasible, to avoid duplex mismatches with auto negotiation. Duplex mismatches
can affect not only users directly connected to the switch but also network traffic that traverses
inter-switch links that have mismatched duplex settings. The port interface speed and duplex settings can be viewed using the show interfaces command.
126
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
NOTE: Because Catalyst switches support only full duplex for 1Gbps links, this is not commonly an issue for GigabitEthernet connections.
Multiple counters in Cisco IOS software can be used to identify a potentially bad NIC or cabling
issue. NIC or cabling issues can be identified by checking the values of certain counters in different
show commands. For example, if the switch port counters show an incrementing number of frames
with a bad CRC or with FCS errors, this can most likely be attributed either to a bad NIC on the
workstation or machine or to a bad network cable.
Network congestion can also cause intermittent connectivity issues in the switched network.
The first sign that your VLAN is overloaded is if the Rx or Tx buffers on a port are oversubscribed.
Additionally, excessive frame drops on a port can also be an indication of network congestion.
A common cause of network congestion is due to underestimating aggregate bandwidth requirements for backbone connections. In such cases, congestion issues can be resolved by configuring
EtherChannels or by adding additional ports to existing EtherChannels. While network congestion
is a common cause of connectivity issues, it is also important to know that the switch itself can experience congestion issues, which can have a similar impact on network performance.
Limited switch bandwidth can result in congestion issues, which can severely impact network performance. As you may recall, in the SWITCH guide we learned that in LAN switching, bandwidth
refers to the capacity of the switch fabric. Therefore, if the switch fabric is on 5Gbps and you attempt to push 7Gbps worth of traffic through the switch, the end result is packet loss and poor
network performance. This is a common issue in oversubscribed platforms, where the aggregate
capacity of all ports can exceed the total backplane capacity.
Hardware problems can also cause connectivity issues in the switched LAN. Examples of such issues include bad ports or bad switch modules. While you could troubleshoot such issues by looking
at physical indicators such as LEDs, if possible, such issues are sometimes difficult to troubleshoot
and diagnose. In most cases, you should seek the assistance of the Technical Assistance Center
(TAC) when you suspect potentially faulty hardware issues.
Software bugs are even more difficult to identify because they cause deviation, which is hard to
troubleshoot. In the event that you suspect a software bug may be causing connectivity issues,
you should contact the TAC with your findings. Additionally, if error messages are printed on the
console or are in the logs, you can also use some of the online tools available from Cisco to implement a workaround or get a recommendation for a version of software in which the issue has been
resolved and verified.
127
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
As with any other hardware device, switches have limited resources, such as physical memory.
When these resources are oversubscribed, this can lead to severe performance issues. Issues such
as high CPU utilization can have a drastic impact on both switch and network performance. Resource troubleshooting for IOS switches is described in the following chapter.
Finally, as with any other technology, incorrect configurations may also cause connectivity issues
either directly or indirectly. For example, the poor placement of the root bridge may result in slow
connectivity for users. Directly integrating or adding an incorrectly configured switch into the production network could result in an outright outage for some or all users. The following sections
describe some common VLAN-related issues, their probable causes, and the actions that can be
taken to remedy them.
Troubleshoo ng Dynamic VLAN Adver sement
Cisco Catalyst switches use VTP to propagate VLAN information dynamically throughout the
switched domain. VTP is a Cisco proprietary Layer 2 messaging protocol that manages the addition, deletion, and renaming of VLANs for switches in the same VTP domain.
There are several reasons why a switch might not be able to receive any VLAN information dynamically when added to the VTP domain. Some common causes include the following:
•
Layer 2 trunking misconfigurations
•
Incorrect VTP configuration
•
Configuration revision number
•
Physical layer issues
•
Software or hardware issues or bugs
•
Switch performance issues
NOTE: For brevity, only trunking, VTP configuration, and the configuration revision number are described in the following section. Physical layer troubleshooting was described in the
previous section. Software or hardware issues or bugs and switch performance issues will be
described in the following chapter.
In order for switches to exchange VLAN information using VTP, a trunk must be established
between the switches. Cisco IOS switches support both ISL and 802.1Q trunking mechanisms.
While some switches default to ISL, which is a Cisco proprietary trunking mechanism, the current
Cisco IOS Catalyst switches default to 802.1Q. When provisioning trunking between switches, it
is considered good practice to specify manually the trunking encapsulation protocol. This is accomplished using the switchport trunk encapsulation [isl|dot1q] interface configuration
command when configuring the link as a trunk port.
128
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
There are several commands that you can use to troubleshoot trunk connectivity issues. You can
use the show interfaces command to verify basic port operational and administrative status.
Additionally, you can append the trunk or errors keyword to perform additional troubleshooting
and verification. The show interfaces [name] counters trunk command can be used to view
the number of frames transmitted and received on trunk ports.
The output of this command also includes encapsulation errors, which can be used to verify 802.1Q
and ISL, and trunking encapsulation mismatches as illustrated in the following output:
Cat-3550-1#show interfaces FastEthernet0/12 counters trunk
Port
Fa0/12
TrunkFramesTx
1696
TrunkFramesRx
32257
WrongEncap
0
Referencing the output above, you can repeat the same command to ensure that both the Tx and Rx
columns are incrementing and perform additional troubleshooting from there. For example, if the
switch is not sending any frames, then the interface might not be configured as a trunk, or it might
be down or disabled. If the Rx column is not incrementing, then it may be that the remote switch
might not be configured correctly.
Another command that can be used to troubleshoot possible Layer 2 trunk misconfigurations is
the show interfaces [name] trunk command. The output of this command includes the trunking encapsulation protocol and mode, the native VLAN for 802.1Q, the VLANs that are allowed to
traverse the trunk, the VLANs that are active in the VTP domain, and the VLANs that are pruned.
A common issue with VLAN propagation is that the upstream switch has been configured to filter
certain VLANs on the trunk link using the switchport trunk allowed vlan interface configuration command. Following is the output of the show interfaces [name] trunk command:
Cat-3550-1#show interfaces trunk
Port
Fa0/12
Fa0/13
Fa0/14
Fa0/15
Mode
desirable
desirable
desirable
desirable
Encapsulation
n-802.1q
n-802.1q
n-isl
n-isl
Port
Fa0/12
Fa0/13
Fa0/14
Fa0/15
Vlans allowed on trunk
1-4094
1-4094
1-4094
1-4094
Port
Vlans allowed and active in management domain
129
Status
trunking
trunking
trunking
trunking
Native vlan
1
1
1
1
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Fa0/12
Fa0/13
Fa0/14
Fa0/15
1-4
1-4
1-4
1-4
Port
Fa0/12
Fa0/13
Fa0/14
Fa0/15
Vlans in spanning tree forwarding state and not pruned
1-4
none
none
none
Another common trunking misconfigurations issue is native VLAN mismatches. When you are
configuring 802.1Q trunks, the native VLAN must match on both sides of the trunk link; otherwise,
the link will not work. If there is a native VLAN mismatch, then STP places the port in a port VLAN
ID (PVID) inconsistent state and will not forward on the link. In such cases, an error message similar to the following will be printed on the console or in the log:
*Mar 1 03:16:43.935: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent
peer vlan id 1 on FastEthernet0/11 VLAN2.
*Mar 1 03:16:43.935: %SPANTREE-2-BLOCK_PVID_PEER: Blocking FastEthernet0/11 on
VLAN0001. Inconsistent peer vlan.
*Mar 1 03:16:43.935: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking FastEthernet0/11 on
VLAN0002. Inconsistent local vlan.
*Mar 1 03:16:43.935: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent
peer vlan id 1 on FastEthernet0/12 VLAN2.
*Mar 1 03:16:43.935: %SPANTREE-2-BLOCK_PVID_PEER: Blocking FastEthernet0/12 on
VLAN0001. Inconsistent peer vlan.
*Mar 1 03:16:43.939: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking FastEthernet0/12 on
VLAN0002. Inconsistent local vlan.
While STP troubleshooting will be described later in this section, this inconsistent state could be
validated using the show spanning-tree command as illustrated below:
Cat-3550-1#show spanning-tree interface FastEthernet0/11
Vlan
------------------VLAN0001
VLAN0002
Role
---Desg
Desg
Sts Cost
--- --------BKN*19
BKN*19
Prio.Nbr
-------128.11
128.11
Type
---------------------------P2p *PVID_Inc
P2p *PVID_Inc
If you have checked and validated that the trunk is indeed correctly configured and operational
between the two switches, then the next step would be to validate VTP configuration parameters.
These parameters include the VTP domain name, the correct VTP mode, and the VTP password,
if one has been configured for the domain, using the show vtp status and show vtp password
commands, respectively. Below is the output of the show vtp status command:
130
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Cat-3550-1#show vtp status
VTP Version
Configuration Revision
Maximum VLANs supported locally
Number of existing VLANs
VTP Operating Mode
VTP Domain Name
VTP Pruning Mode
VTP V2 Mode
VTP Traps Generation
MD5 digest
:
:
:
:
:
:
:
:
:
:
running VTP2
0
1005
8
Server
TSHOOT
Enabled
Enabled
Disabled
0x26 0x99 0xB7 0x93 0xBE 0xDA 0x76 0x9C
...
[Truncated Output]
When using the show vtp status command, ensure that the switches are running the same version of VTP. By default, Catalyst switches run VTP version 1. A switch running VTP version 1 cannot participate in a VTP version 2 domain. If the switch is incapable of running VTP version 2, then
all VTP version 2 switches should be configured to run version 1 instead using the vtp version
global configuration command.
NOTE: If you change the VTP version on the server, then the change is propagated automatically to client switches in the VTP domain.
As was described in the SWITCH guide, VTP propagation is enabled for VTP client/server or
server/server devices. If VTP is disabled on a switch (i.e., transparent mode), then the switch will
not receive VLAN information dynamically via VTP. However, be mindful of the fact that with
version 2, transparent mode switches will forward received VTP advertisements out of their trunk
ports and act as VTP relays. This happens even if the VTP version is not the same.
The VTP domain name should also be consistent on the switches.
Finally, the output of the show vtp status command also includes the MD5 hash used for authentication purposes. This hash, which is derived from the VTP domain name and password, should
be consistent on all switches in the domain. If the VTP passwords or domain names are different
on the switches, then the calculated MD5 will also be different. If the domain name or password is
different, then the show vtp status command will indicate an MD5 digest checksum mismatch
as illustrated in the following output:
Cat-3550-1#show vtp status
VTP Version
: running VTP2
Configuration Revision
: 0
Maximum VLANs supported locally : 1005
131
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Number of existing VLANs
: 8
VTP Operating Mode
: Server
VTP Domain Name
: TSHOOT
VTP Pruning Mode
: Enabled
VTP V2 Mode
: Enabled
VTP Traps Generation
: Disabled
MD5 digest
: 0x26 0x99 0xB7 0x93 0xBE 0xDA 0x76 0x9C
*** MD5 digest checksum mismatch on trunk: Fa0/11 ***
*** MD5 digest checksum mismatch on trunk: Fa0/12 ***
...
[Truncated Output]
Finally, the configuration revision number can wreak havoc when using VTP. Switches use the
configuration revision number to keep track of the most recent information in the VTP domain.
Every switch in the domain stores the configuration revision number that it last heard from a VTP
advertisement and this number is incremented every time new information is received. When any
switch in the VTP domain receives an advertisement message with a higher configuration revision
number than its own, it will overwrite any stored VLAN information and synchronize its own
stored VLAN information with the information received in the advertisement message.
Therefore, if you are wondering why the switch that you integrated into the VTP domain is not
receiving any VLAN information, it may be that the same switch had a higher configuration revision number and caused all other switches to overwrite their local VLAN information and replace
it with the information received in the advertisement message from the new switch. To avoid such
situations, always ensure that the configuration revision number is set to 0 prior to integrating a
new switch into the domain. This can be done by changing the VTP mode or changing the VTP
domain name on the switch. The configuration revision number is included in the output of the
show vtp status command.
Troubleshoo ng Loss of End-to-End Intra-VLAN Connec vity
There are several possible reasons for a loss of end-to-end connectivity within a VLAN. Some of the
most common causes include the following:
•
Physical layer issues
•
VTP pruning
•
VLAN trunk filtering
•
New switches
•
Switch performance issues
•
Network congestion
•
Software or hardware issues or bugs
132
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
NOTE: For brevity, only trunking, VTP pruning, trunk filtering, and the integration of new
switches into the domain will be described in this section. Software or hardware issues or bugs
and switch performance issues are described in the following chapter. Physical layer troubleshooting was described earlier in this chapter.
VTP pruning removes VLANs from the VLAN database of the local switch when no local ports are
a part of that VLAN. VTP pruning increases the efficiency of trunks by eliminating unnecessary
Broadcast, Multicast, and unknown traffic from being flooded across the network.
While VTP pruning is a desirable feature to implement, incorrect configuration or implementation can result in a loss of end-to-end VLAN connectivity. VTP pruning should be enabled only in
client/server environments. Implementing pruning in a network that includes transparent mode
switches may result in a loss of connectivity. If one or more switches in the network are in VTP
transparent mode, you should either globally disable pruning for the entire domain or ensure that
all VLANs on the trunk link(s) to the upstream transparent mode switch(es) are pruning ineligible
(i.e., they are not pruned), using the switchport trunk pruning vlan interface configuration
command under the applicable interfaces.
In addition to VTP pruning, incorrectly filtering VLANs on switch trunk links can result in a loss of
end-to-end VLAN connectivity. By default, all VLANs are allowed to traverse all trunk links; however, Cisco IOS software allows administrators to remove (or add) VLANs selectively to specific
trunk links using the switchport trunk allowed vlan interface configuration command. Use
the show interfaces [name] trunk and the show interfaces [name] switchport commands to view pruned and restricted VLANs on trunk links. Following is the output of the show
interfaces [name] trunk command:
Cat-3550-1#show interfaces trunk
Port
Fa0/1
Fa0/2
Mode
on
on
Encapsulation Status
802.1q
trunking
802.1q
trunking
Port
Fa0/1
Fa0/2
Vlans allowed on trunk
1,10,20,30,40,50
1-99,201-4094
Port
Fa0/1
Fa0/2
Vlans allowed and active in management domain
1,10,20,30,40,50
1,10,20,30,40,50,60,70,80,90,254
Port
Fa0/1
Fa0/2
Vlans in spanning tree forwarding state and not pruned
1,10,20,30,40,50
1,40,50,60,70,80,90,254
133
Native vlan
1
1
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Following is the output of the show interfaces [name] switchport command on a port that
has been configured statically as an 802.1Q trunk link:
Cat-3550-2#show interfaces FastEthernet0/7 switchport
Name: Fa0/7
Switchport: Enabled
Administrative Mode: trunk
Operational Mode: trunk
Administrative Trunking Encapsulation: dot1q
Operational Trunking Encapsulation: dot1q
Negotiation of Trunking: On
Access Mode VLAN: 1 (default)
Trunking Native Mode VLAN: 1 (default)
Administrative Native VLAN tagging: enabled
Voice VLAN: none
Administrative private-vlan host-association: none
Administrative private-vlan mapping: none
Administrative private-vlan trunk native VLAN: none
Administrative private-vlan trunk Native VLAN tagging: enabled
Administrative private-vlan trunk encapsulation: dot1q
Administrative private-vlan trunk normal VLANs: none
Administrative private-vlan trunk associations: none
Administrative private-vlan trunk mappings: none
Operational private-vlan: none
Trunking VLANs Enabled: 3,5,7
Pruning VLANs Enabled: 2-8
Capture Mode Disabled
Capture VLANs Allowed: ALL
Protected: false
Unknown unicast blocked: disabled
Unknown multicast blocked: disabled
Appliance trust: none
As was described in the previous section, the integration of a new switch into the network can result in a loss of VLAN information in the management domain. This loss of VLAN information can
result in a loss of connectivity between devices within the same VLAN. Ensure that the configuration revision number is reset prior to integrating a new switch into the LAN.
USING THE ‘SHOW VLAN’ COMMAND
In addition to the commands that were described in the previous sections, there are additional
Cisco IOS software commands that are useful for both verifying and troubleshooting VLAN configurations. One of the most commonly used VLAN verification and troubleshooting commands is
the show vlan command. This command displays parameters for all VLANs within the administrative domain as illustrated in the following output:
134
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Cat-3550-1#show vlan
VLAN Name
Status
Ports
---- -------------------------------- --------- ------------------------------1
default
active
Fa0/11, Fa0/12, Fa0/13, Fa0/14
Fa0/20, Fa0/21, Fa0/22, Fa0/23
Fa0/24
150 VLAN_150
active
Fa0/2, Fa0/3, Fa0/4, Fa0/5
Fa0/6, Fa0/7, Fa0/8, Fa0/9
Fa0/10
160 VLAN_160
active
Fa0/15, Fa0/16, Fa0/17, Fa0/18
Fa0/19
170 VLAN_170
active
Gi0/1, Gi0/2
1002 fddi-default
active
1003 token-ring-default
active
1004 fddinet-default
active
1005 trnet-default
active
VLAN
---1
150
160
170
1002
1003
1004
1005
Type
----enet
enet
enet
enet
fddi
tr
fdnet
trnet
SAID
---------100001
100150
100160
100170
101002
101003
101004
101005
MTU
----1500
1500
1500
1500
1500
1500
1500
1500
Parent
------
RingNo
------
BridgeNo
--------
Stp
---ieee
ibm
BrdgMode
--------
Trans1
-----0
0
0
0
0
0
0
0
Trans2
-----0
0
0
0
0
0
0
0
Remote SPAN VLANs
------------------------------------------------------------------------------
Primary Secondary Type
Ports
------- --------- ----------------- ------------------------------------------
This command prints all available VLANs along with the ports that are assigned to each of the individual VLANs. Only access ports, regardless of whether they are up or down, will be included in the
output of this command. Trunk links will not be included, as these belong to all VLANs. The show
vlan command also provides information on RSPAN VLANs, as well as Private VLAN (PVLAN)
configuration on the switch. The show vlan command can be used with additional keywords to
provide information that is more specific. The following output displays the supported additional
keywords that can be used with this command:
Cat-3550-1#show
brief
id
ifindex
name
private-vlan
vlan ?
VTP all VLAN status in brief
VTP VLAN status by VLAN id
SNMP ifIndex
VTP VLAN status by VLAN name
Private VLAN information
135
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
remote-span
summary
|
<cr>
Remote SPAN VLANs
VLAN summary information
Output modifiers
The brief field prints a brief status of all active VLANs. The output that is printed by this command is the same as the output above, with the only difference being that the last two sections will
be omitted. The id field provides the same information as the show vlan command, but only for
the specified VLAN as shown in the following output:
Switch-1#show vlan id 150
VLAN Name
Status
Ports
---- -------------------------------- --------- ------------------------------150 VLAN_150
active
Fa0/1, Fa0/2, Fa0/3, Fa0/4
Fa0/5, Fa0/6, Fa0/7, Fa0/8
Fa0/9, Fa0/10
VLAN Type SAID
MTU
Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2
---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ---150 enet 100150
1500 0
0
Remote SPAN VLAN
---------------Disabled
Primary Secondary Type
Ports
------- --------- ----------------- ----------------------------------------
Again, the VLAN name is included in the output, as are all of the access ports that belong to the
VLAN. Trunk ports are not included in this output because they belong to all VLANs. Additional
information also includes the VLAN MTU, RSPAN configuration (if applicable), and PVLAN configuration parameters (if applicable).
The name field allows the VLAN name to be specified instead of the ID. This command prints the
same information as the show vlan id <number> command. The ifindex field displays the
SNMP IfIndex for the VLAN (if applicable), while the private-vlan and remote-span fields print
PVLAN and RSPAN configuration information, respectively. Finally, the summary field prints a
summary of the number of VLANs that are active in the management domain. This includes standard and extended VLANs.
Another useful VLAN troubleshooting command is the show vtp counters command. This
command prints information on VTP packet statistics. Following is the output of the show vtp
counters command on a switch configured as a VTP server (default):
136
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Cat-3550-1#show vtp counters
VTP statistics:
Summary advertisements received
Subset advertisements received
Request advertisements received
Summary advertisements transmitted
Subset advertisements transmitted
Request advertisements transmitted
Number of config revision errors
Number of config digest errors
Number of V1 summary errors
:
:
:
:
:
:
:
:
:
15
10
2
19
12
0
0
0
0
VTP pruning statistics:
Trunk
Join Transmitted Join Received
Summary advts received from
non-pruning-capable device
------------- ---------------- ---------------- --------------------------Fa0/11
0
1
0
Fa0/12
0
1
0
The first six lines of the output printed by the show vtp counters command provide the statistics
for the three types of VTP packets: advertisement requests, summary advertisements, and subset
advertisements. These different messages will be described in the following section.
VTP advertisement requests are requests for configuration information. These messages are sent
by VTP clients to VTP servers to request VLAN and VTP information they may be missing. A VTP
advertisement request is sent out when the switch resets, the VTP domain name changes, or in the
event that the switch has received a VTP summary advertisement frame with a higher configuration revision number than its own. VTP servers should show only the received counters incrementing, while any VTP clients should show only the transmitted counters incrementing.
VTP summary advertisements are sent out by servers every five minutes, by default. These types
of messages are used to tell an adjacent switch of the current VTP domain name the configuration
revision number and the status of the VLAN configuration, as well as other VTP information that
includes the time stamp, the MD5 hash, and the number of subset advertisements to follow. If these
counters are incrementing on the server, then there is more than one switch acting or configured
as a server in the domain.
VTP subset advertisements are sent out by VTP servers when VLAN configuration changes, such
as when a VLAN is added, suspended, changed, deleted, or other VLAN-specific parameters (e.g.,
VLAN MTU) have changed. One or more subset advertisements will be sent following the VTP summary advertisement. A subset advertisement contains a list of VLAN information. If there are several
VLANs, more than one subset advertisement may be required in order to advertise all the VLANs.
137
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The Number of config revision errors field shows the number of advertisements that the
switch cannot accept because it received packets with the same configuration revision number but
with a different MD5 hash value. This is common when changes are made to two or more server
switches in the same domain at the same time and an intermediate switch receives these advertisements at the same time. This concept is illustrated in Figure 3-3 below, which illustrates a basic
switched network:
Root: VLAN 20, 40
Root: VLAN 10, 30
Sw1
Sw3
Sw2
Sw5
Sw4
Fig. 3-3. Troubleshooting Configuration Revision Number Errors
Figure 3-3 illustrates a basic network that incorporates redundancy and load sharing. It should be
assumed that Sw1 and Sw2 are configured as servers, while Sw3 is configured as a client. Sw1 is the
root for VLANs 10 and 30, while Sw2 is the root for VLANs 20 and 40. Assume that a simultaneous change is implemented on Sw1 and Sw2 adding VLAN 50 to Sw1 and VLAN 60 to Sw2. Both
switches send out an advertisement following the change to the database.
The change is propagated throughout the domain, overwriting the previous databases of the other
switches that receive this information. Assume that Sw5 receives the same information from neighbors at the same time and both advertisements contain the same configuration revision number.
In such situations, the switch will not be able to accept either advertisement because they have the
same configuration revision number but different MD5 hash values.
When this occurs, the switch increments the Number of config revision errors counter and
does not update its database. This situation can result in a loss of connectivity within one or more
VLANs because VLAN information is not updated on the switch. To resolve this issue and ensure
that the local database on the switch is updated, configure a dummy VLAN on one of the server
switches, which results in another update with an incremented configuration revision number. This
will overwrite the local database of all switches, allowing Sw5 to update its database as well. Keep
in mind that this is not a common occurrence; however, it is possible, hence, the reason for this
counter.
138
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
The Number of config digest errors counter increments whenever the switch receives an
advertisement with a different MD5 hash value than it calculated. This is the result of different VTP
passwords configured on the switches. You can use the show vtp password command to verify
that the configured VTP password is correct. It is also important to remember that the passwords
may be the same but hardware or software issues or bugs could be causing data corruption of VTP
packets, resulting in these errors.
Finally, the VTP pruning statistics field will only ever contain non-zero values when pruning
is enabled for the VTP domain. Pruning is enabled on servers and this configuration is propagated
throughout the VTP domain. Servers will receive joins from clients when pruning has been enabled
for the VTP domain.
SPANNING TREE PROTOCOL OVERVIEW
In order to troubleshoot Spanning Tree Protocol (STP), you need to have a solid understanding of
the protocol, its inner workings, and its default parameters. This section discusses STP fundamentals, which are an important element when troubleshooting STP issues. Additional detailed information on STP can be found in the current SWITCH guide that is available online.
The Spanning Tree Protocol is defined in the IEEE 802.1D standard. The primary purpose of STP
is to attempt to provide a loop-free topology in a redundant Layer 2 network environment. The
word ‘attempt’ is used because implementing STP does not always guarantee a loop-free switched
network. This is because the STP operates by making the following assumptions about the network:
1. All links are bidirectional and can both send and receive BPDUs
2. The switch is able to receive, process, and send BPDUs regularly
All switches that reside in the Spanning Tree domain communicate and exchange messages using Bridge Protocol Data Units (BPDUs). The exchange of BPDUs is used by STP to determine
the network topology. The topology of an active switched network is determined by the following
three variables:
1. The unique MAC address (switch identifier) that is associated with each switch
2. The path cost to the root bridge associated with each switch port
3. The port identifier (MAC address of the port) associated with each switch port
Configuration BPDUs are sent by LAN switches and are used to communicate and compute the
Spanning Tree topology. After the switch port initializes, the port is placed into the blocking state
139
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
and a BPDU is sent to each port in the switch. By default, all switches initially assume that they are
the root of the Spanning Tree until they exchange Configuration BPDUs with other switches. As
long as a port continues to see its Configuration BPDU as the most attractive, it will continue sending Configuration BPDUs. Switches determine the best Configuration BPDU based on the following four factors (in the order listed):
1. Lowest root bridge ID
2. Lowest root path cost to root bridge
3. Lowest sender bridge ID
4. Lowest sender port ID
The completion of the Configuration BPDU exchange results in the following actions:
1. A root switch is elected for the entire Spanning Tree domain
2. A root port is elected on every non-root switch in the Spanning Tree domain
3. A designated switch is elected for every LAN segment
4. A designated port is elected on the designated switch for every segment
5. Loops in the network are eliminated by blocking redundant paths
A Configuration BPDU is always transmitted away from the root bridge and to the rest of the
switches within the STP domain. The simplest way to remember the flow of Configuration BPDUs
after the Spanning Tree network has converged is to memorize the following four rules:
1. A Configuration BPDU originates on the root bridge and is sent via the designated port
2. A Configuration BPDU is received by a non-root bridge on a root port
3. A Configuration BPDU is transmitted by a non-root bridge on a designated port
4. There is only one designated port (on a designated switch) on any single LAN segment
In a stable and ‘healthy’ switched network, the majority of the BPDUs sent by switches should be
Configuration BPDUs. However, another type of BPDU, the Topology Change Notification (TCN)
BPDU may also be sent by switches. The TCN BPDU plays a key role in handling changes in the active topology. This BPDU is used to inform downstream switches of a change in the Spanning Tree
network topology. A switch originates a TCN BPDU in the following two ways:
1. It transitions a port into the Forwarding state and it has at least one designated port
2. It transitions a port from either the Forwarding or Learning states to the Blocking state
140
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Unlike Configuration BPDUs, which are always originated by the root bridge and are received on
the root port of a non-root bridge, TCN BPDUs are originated by any switch and are sent upstream toward the root bridge via the root port to alert the root bridge that the active topology has
changed. Once the root bridge acknowledges the TCN, it propagates it to all the other switches in
the Spanning Tree domain.
The Spanning Tree Algorithm (STA) defines a number of states that a port under STP control will
progress through before being in an active Forwarding state. These port states are Blocking, Listening, Learning, Forwarding, and Disabled.
By default, following initialization, all switches initially assume that they are the root of the Spanning Tree until they exchange BPDUs with other switches. When switches exchange BPDUs, an
election is held and the switch with the highest Bridge Priority is elected the STP root bridge. If
two or more switches have the same priority, then the switch with the lowest order MAC address
is chosen or elected as the root bridge of the STP network. During STP root election, no traffic is
forwarded over any switch in the same STP domain.
The Spanning Tree Protocol uses cost and priority values to determine the best path to the root
bridge. These values are then used in the election of the root port, which will be described in the
following section. It is important to understand the calculation of the cost and priority values in
order to understand how Spanning Tree selects one port over another, for example.
One of the key functions of the STA is to attempt to provide the shortest path to each switch in the
network from the root bridge. Once selected, this path is then used to forward data while redundant
links are placed into a blocking state. STA uses two values to determine which port will be placed
into a Forwarding state (i.e., the best path to the root bridge) and which port(s) will be placed into a
Blocking state. These values are the port cost and the port priority. The 802.1D specification assigns
16-bit (short) default port cost values to each port that is based on the port’s bandwidth. Because
administrators also have the capability to assign port cost values (between 1 and 65,535) manually,
the 16-bit values are used only for ports that have not been configured specifically for port cost.
In the event that multiple ports have the same path cost, STP considers the port priority when
selecting which port to put into the Forwarding state. The valid port priority range is from 0 to 240
and the Cisco IOS default value is 128. This value can be adjusted manually by the administrator to
influence which port is selected by the STA; the lower the numerical number, the more preferred
the port. The default port priority is adjusted in increments of 16.
141
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
When all LAN ports have the same priority value, STP will place the port with the lowest port
number into the Forwarding state and block the other ports. The port priority is locally significant
between two switches. If a switch is connected via multiple links to another switch, then it uses one
of the following tie-breaker mechanisms to determine to which port it will be forwarding:
•
Lowest root bridge ID
•
Lowest root path cost to root bridge
•
Lowest sender bridge ID
•
Lowest sender port ID
Spanning Tree BPDUs include several timers that play an integral role in the operation of the protocol. The Spanning Tree timer values are contained in the last three fields of a BPDU. Within the
Spanning Tree domain, the only timer values that are important are those that are sent by the root
bridge. In other words, non-root bridges are not concerned with locally configured timer values.
The default Spanning Tree timers go hand-in-hand with the IEEE 802.1D specification that also
recommends a maximum network diameter of seven. The Spanning Tree diameter is the maximum
distance that any two single switches can be from each other. A maximum diameter of seven means
that two distinct switches cannot be more than seven hops away from each other. This concept will
be described in detail later in this chapter.
Because all other switches in the Spanning Tree domain use the timer values advertised by the root
bridge, the modification of any of these values should always be made at the root bridge. By setting
these values in the STP root, these values will be passed (via BPDUs) to other switches in the STP
domain. The three configurable Spanning Tree timer values are as follows:
1. The Hello Time
2. The Forward Delay
3. The Max Age
The Hello Time is the time between each BPDU that is sent. This time is equal to two seconds by
default, but you can tune the time to be between 1 and 10 seconds. While the Hello Time received
in the Configuration BPDU from the root bridge is propagated unchanged throughout the Spanning Tree domain, all switches have their own local Hello Time for TCN BPDUs that the switches
transmit. The IEEE 802.1D standard specifies a default Hello Time value of two seconds based on a
recommended Spanning Tree diameter of seven switches.
The Forward Delay is the time that is spent in the Listening and Learning states. When the port
transitions to the Listening state, it indicates a change in the current Spanning Tree topology and
142
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
that port will go from a Blocking state to a Forwarding state. The Forward Delay is used to cover
the period between the Blocking and Forwarding states, which includes the Listening and Learning
states. This time is set to 15 seconds (sec) by default, but can be set manually to be between 4 and
30 seconds. As is the case with the Hello Time, the default Forward Delay value is based on the IEEE
Spanning Tree diameter of seven switches.
The Max Age time is set in the BPDU by the root bridge and defaults to 20 seconds. This timer can
be set manually to any number between 6 and 40 seconds. The Max Age value remains the same
for all BPDUs that are propagated by all switches in the Spanning Tree domain. Any changes to this
value on the root bridge are propagated to the other switches in the Spanning Tree domain. The
default Max Age is based on the IEEE Spanning Tree diameter of seven switches.
As was previously stated, Configuration BPDUs are sent only by the root bridge. In the event that the
root bridge fails or is removed from the network, Configuration BPDUs will no longer be sent and
the STP network is considered broken. Only when the Max Age timer expires will another switch
be elected root bridge, effectively restoring the flow of Configuration BPDUs and the STP network.
Reducing the values of these timers can be used to reduce the time it takes for the switched network
to converge. However, in doing so, it may also cause additional issues in the network, especially in
unstable topologies. In such topologies, STP timers should instead be increased until the issue (e.g.,
flapping trunk links, etc.) can be resolved.
TROUBLESHOOTING SPANNING TREE PROTOCOL
As previously stated, troubleshooting STP can be a very complex task. Because STP loops can bring
down an entire network, it is important to understand exactly what is happening and quickly implement an appropriate solution, as in most cases, you might have no more than a few minutes. There
are two primary reasons for STP protocol failure or issues: configuration errors and the loss of STP
BDPUs. Additional causes also include MAC address table corruption, hardware and software issues or bugs, and switch resource utilization issues.
Designing and Implemen ng a Sound STP Network
Several elements that should be taken into consideration when designing a solid STP network include the following:
•
Determine the location of the root bridge
•
Design a deterministic topology
•
Integrate Layer 3 switching
•
Avoid end-to-end VLAN solutions
143
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The root bridge is the crux of an STP network. The root is responsible for generating STP Configuration BPDUs, which are propagated throughout the domain to other downstream switches. The
root bridge sends out Hello packets every Hello interval, which defaults to two seconds. If the root
bridge fails, then Configuration BPDUs will no longer be sent. In this case, switches will wait for the
expiration of the Max Age time before invaliding stored BPDUs and beginning the process of electing a new root bridge. Network connectivity is restored after the new root bridge has been elected.
There is no explicit configuration required to facilitate root bridge election when running STP.
However, it is recommended practice to select explicitly a device that should be elected root bridge
and ensure that this is well documented so that this information is available when troubleshooting
any STP issues. In addition, allowing STP to elect a root bridge could result in poor performance
for some network segments as illustrated in Figure 3-4 below:
Network Users
Network Resources
0000.0000.0001
0000.0000.0002
0000.0000.0003
0000.0000.0004
Network Users
Network Users
Fig. 3-4. Suboptimal Forwarding Due to Root Bridge Placement
Referencing Figure 3-4, assuming the switches in the network all have the same priority value,
which would be 32,768 by default, STP would use the lowest MAC address to elect the root bridge.
Following this logic, the switch with MAC address 0000.0000.0001 would be elected as the root.
The result is that user traffic from the other switches in the network will have to traverse that switch
to reach network resources, such as servers, as illustrated by the arrows in Figure 3-4.
Depending on the capabilities of this switch (e.g., bandwidth and processing capabilities), there
would most likely be performance issues, such as slow response, on the network for users connected to the other switches accessing network resources. In this network, the recommended solution would be to configure the switch with MAC address 0000.0000.0002 as the root bridge. This
provides an optimum path for network user traffic that is destined to any network resources (e.g.,
network servers and printers) as illustrated by the arrows in Figure 3-5 below:
144
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Network Users
Network Resources
0000.0000.0001
0000.0000.0002
0000.0000.0003
0000.0000.0004
Network Users
Network Users
Fig. 3-5. Optimal Forwarding Due to Root Bridge Placement
There are two supported methods of influencing root bridge election. The first method entails
specifying the Bridge Priority manually using the spanning-tree vlan [number] priority
[value] global configuration command. Valid priority values can be configured beginning at 0 and
incrementing in value by 4,096 thereafter (e.g., 0, 4,096, 8,192…61,440).
The second method entails using a built-in macro available in Cisco IOS software. This macro is
invoked using the spanning-tree root [vlan] [primary|secondary] global configuration
command. When this command is executed using the [primary] keyword, Cisco IOS software
checks the switch priority of the current root switch for the specified VLAN.
Because of the extended system ID support, the switch sets the switch priority for the specified
VLAN to 24,576 if this value will cause this switch to become the root for the specified VLAN.
However, if the root switch for the specified VLAN has a priority lower than 24,576, then the switch
sets its own priority for the specified VLAN to 4,096 less than the lowest switch priority. This continues until the switch has a lower priority than the current root and is itself elected the root.
When using the macro, the switch will become the root bridge only for the specified VLAN. It is
also important to remember that this command runs only once. In other words, if after executing this command the switch becomes the root bridge for that VLAN, should another switch be
configured with a priority value lower than that selected by the macro, then that other switch will
become the root bridge instead. The macro will need to be reissued in order to influence the root
bridge election process again. For this reason, it is recommended that you specify the root bridge
manually using the manual bridge priority configuration method.
145
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In addition to determining the location of the root bridge, it is also important to ensure that the
traffic flows in the LAN are deterministic, meaning that they are not determined randomly. This
entails knowing which ports should be placed into a Forwarding state and which ports will be in a
Blocking state when the root bridge is active, and even when the secondary (backup) root bridge is
active, if applicable. As stated in Chapter 1, traffic flows should be included in the network documentation. Understanding the path traffic should take during normal and backup scenarios simplifies the troubleshooting process.
Consideration should also be given to the number of redundant links included in the topology.
While redundancy allows for high availability, too much redundancy can cause you more problems
than it resolves. Remember, all it takes for a Spanning Tree loop to be created, which typically results in a network meltdown, is for a single blocking port to transition mistakenly into the Forwarding state. The greater the number of redundant network links, the greater the number of ports that
STP will place into the Blocking state and, ultimately, the higher the chances of a Spanning Tree
loop developing in the network. Avoid over-redundant solutions.
Layer 3 switching solutions, such as Cisco Express Forwarding (CEF), allow for the routing of traffic at switching speeds. The different switching mechanisms supported in Cisco IOS software are
described in additional detail in the following chapter. Routing allows for inter-VLAN communication. In addition, routers break up Broadcast and collision domains. Layer 3 switching provides the
following advantages over Layer 3 routing:
•
Hardware-based packet forwarding
•
High-performance packet switching
•
High-speed scalability
•
Low latency
•
Lower per-port cost
•
Flow accounting
•
Security
•
Quality of Service (QoS)
When designing and implementing switched LAN solutions, consider a design that uses local
VLANs over end-to-end VLANs. End-to-end VLANs are VLANs that span the entire switch fabric of a network. These VLANs are also commonly referred to as campus-wide VLANs, as they
sometimes span the entire campus LAN so that network hosts and their servers remain in the same
VLAN (logically), even though the devices may physically reside in different buildings, for example.
End-to-end VLAN implementation is based on the 80/20 rule and therefore requires that each
VLAN exist at the Access layer in every switch block. The primary reason for end-to-end VLAN
146
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
implementation is to support maximum flexibility and the mobility of end devices. These solutions
have the following characteristics:
•
They allow the grouping of users into a single VLAN independent of physical location
•
They are difficult to implement and troubleshoot
•
Each VLAN provides common security and resource requirements for members
•
They become extremely complex to maintain as the campus network grows
Unlike end-to-end VLANs, local VLANs are based on geographic locations by demarcation at a hierarchical boundary. These VLANs are designed for modern-day networks that adhere to the 80/20
rule, where end users typically require greater access to resources outside of their local VLAN.
With local VLANs, up to 80% of the traffic is destined to the Internet or other remote network locations, while no more than 20% of the traffic remains local.
Despite the name, local VLANs are not restricted to a single switch and can range in size from a
single switch in a wiring closet to an entire building. This VLAN implementation method provides
maximum availability by using multiple paths to destinations, maximum scalability by keeping the
VLAN within a switch block, and maximum manageability.
Common Causes for Bridging Loops
While a solid design should minimize STP failures, it cannot guarantee that problems or issues will
not arise within the network. When STP fails, a bridging loop often ensues. A loop originates when a
port that should be blocking transitions to the Forwarding state. Given that there is no concept similar
to the Layer 3 TTL at Layer 2, when this happens, frames can traverse the network endlessly, consuming switch resources and possibly leading to a complete network meltdown. There are multiple
reasons a bridging loop can occur. Common reasons include, but are not limited to, the following:
•
Physical layer connectivity issues
•
Switch misconfigurations
•
Switch resource utilization issues
•
Broadcast storms
•
Hardware or software errors
A loss of connectivity between switches results in a loss of BPDUs. If a switch does not receive
BPDUs on a port on which they should be received, it transitions other ports into the Forwarding
state, which can result in a Spanning Tree loop if the root port link is not actually down. Such problems are typically caused by duplex mismatches for Ethernet trunk links and unidirectional links
when using Fiber Optic trunk links.
147
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Duplex mismatches are applicable only when using 10/100 links. As stated previously, Catalyst
switches support only full duplex for 1Gbps links. Given that this is the standard implemented in
most modern networks, duplex mismatches are typically a non-issue. However, because 10/100
links can be used for trunk connectivity, it is still worth keeping this in mind in the event that you
are using 10/100 Ethernet links for trunking between switches in your network. Figure 3-6 below
illustrates how a simple duplex mismatch could cause a bridging loop:
Root Bridge
F
F
Switch 2
F
F
F
F
B
F
B
F
Switch 3
Switch 4
Fig. 3-6. Bridging Loop Formation Due to Duplex Mismatches
Figure 3-6 illustrates a basic switched network. The relevant port states (i.e., Forwarding and Blocking) are illustrated based on the location of the root bridge. Assume, for example, that auto negotiation fails between the root bridge and Switch 4, resulting in the root bridge defaulting to half-duplex operation, while Switch 4 defaults to full-duplex operation for this link. Half-duplex Ethernet
uses a carrier sensing mechanism (CSMA/CD) on the physical medium to determine when gaps
between frame transmissions occur. Stations may begin transmitting any time they detect that the
network is quiet.
When a collision occurs, a jam signal is sent out and the devices involved in the collision stop transmitting for a short period of time. A random back-off algorithm is then executed to ensure that the
devices are transmitting at random times so that they avoid another collision or collisions. Given
this, if Switch 4 is transmitting enough traffic, it is possible that the current root bridge (half duplex)
might defer all packets, including BPDUs, until it senses that the medium is available. Should this
happen, Switch 4 might not receive BPDUs on its root port within the 20-second timeframe and
from an STP point of view; assume that it has lost its connection to the root bridge and thus bring
up the backup port currently in the Blocking state. Once this port is unblocked, a bridging loop is
created.
Unlike Ethernet, Fiber Optic links do not use CSMA/CD. However, transceiver issues can cause
unidirectional communication issues between two switches, which can result in the same problem
described with half-duplex Ethernet connection. When using Fiber Optic for trunk links between
148
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
switches, it is recommended that UniDirectional Link Detection (UDLD) be enabled. UDLD can detect improper cabling or unidirectional links at Layer 2 and can break resulting loops automatically
by disabling some ports. By default, when UDLD is enabled on Cisco Catalyst switches, it will only
be enabled for Fiber Optic links. However, UDLD can also be enabled manually for Ethernet links.
Another common configuration error that results in STP protocol failure or bridging loops is the
configuration of aggressive timers. As stated earlier in this chapter, the default STP timers are based
on an STP network diameter of seven bridges or switches. The default timer values are listed and
described below in Table 3-3 below:
Table 3-3. Default Spanning Tree Timers
Timer
Hello Time
Max Age
Forward Delay
Default
2 seconds
20 seconds
15 seconds
Description
Specifies the time between sending Spanning Tree BPDUs
Specifies the maximum time to save BPDU information
Specifies the time spent in the Listening and Learning states
Cisco IOS software provides two methods of changing the default STP timers, which may be used
to decrease the convergence time of the switched network. The first method entails specifying the
individual timers manually, on a per-VLAN basis, using the spanning-tree vlan [vlan] [forward-time | hello-time | max-age] <interval> global configuration command on the root
bridge. The second option includes using a macro that can be enabled using the spanning-tree
vlan [vlan] root [primary | secondary] diameter <2-7> global configuration command
on either the root or backup root bridge. When the macro is used, Cisco IOS software will automatically set the most optimum STP timers for the specified network diameter.
While aggressive timers do allow for faster convergence, they can also make troubleshooting the
network much more difficult, especially during periods of instability where the topology constantly
fluctuates. For this reason, do not arbitrarily adjust STP timers without valid justification or a recommendation from the TAC. Instead, consider alternatives, such as RSTP, that do speed up convergence without having to manipulate timers.
Finally, another common configuration error that results in loops is PortFast configuration. PortFast is a Cisco STP enhancement that, when enabled, allows the specified port to skip the first
stages of the STA and directly transition to the Forwarding state. PortFast can be enabled on both
trunk and access links. A common reason for configuring or enabling PortFast on an access port
is if a user complains that he or she is unable to get an IP address from the DHCP server, which is
most likely due to a timeout because of the amount of time it takes for an STP port to transition to
the Forwarding state. In such cases, PortFast can be used to resolve this issue.
149
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
A common misconception regarding PortFast is that because the trunk keyword can be specified
in conjunction with the spanning-tree portfast interface configuration command, enabling
PortFast on a trunk link will also decrease convergence. This is not true because PortFast does not
disable Spanning Tree on the selected port, and this can result in bridging loops. The spanningtree portfast trunk interface configuration command should be used only on trunk links that
are connected to devices, such as servers or firewalls, that do not send BPDUs but do support
multiple VLANs.
Another common reason for bridging loops is high switch resource utilization. This includes primarily CPU and memory consumption, but it could also be due to congestion. High CPU utilization may be due to a plethora of reasons. Additionally, the factors that contribute to this will vary
depending on the switch platform. While Catalyst switches always service control packets (e.g.,
BPDUs) before normal packets (e.g., ICMP packets), overutilization of the CPU can result in the
switch running out of resources to transit control packets. Again, this can lead to a bridging loop.
There are several Cisco IOS commands that can be used to check system health and troubleshoot
high resource utilization issues, with the most common command being the show processes
command. Additional performance troubleshooting commands are described in detail in the following chapter.
A Broadcast storm means that your network is overwhelmed with a constant flow of Broadcast or
Multicast traffic. Broadcast storms can eventually lead to a complete loss of network connectivity as the packets proliferate. While uncommon in the high-bandwidth LANs of today, it is quite
possible for one or more devices (e.g., a powerful server) to bring down the network because of
Broadcasts. A high volume of Broadcast traffic consumes bandwidth and can result in packet loss,
including the loss of BPDUs, leading to a bridging loop.
A more common cause for Broadcast storms in modern-day networks is Multicast. IP Multicast
uses packets that include IP Options, such as Internet Group Messaging Protocol (IGMP) packets
used in Multicast implementations. These exception packets are punted to the CPU of the switch
for processing. A large number of these packets can increase CPU utilization, depleting the resources the switch has to process and transmit control packets, such as BPDUs.
Broadcast storms can be mitigated by enabling the traffic suppression feature under applicable interfaces using the storm-control [broadcast | multicast | unicast] level <percent |
pps> interface configuration command. In addition, high-end switches, such as the Catalyst 6500
Series switch, also support the rate-limiting (throttling) of packets that are punted to the CPU,
allowing for additional protection against such events. These options are described in additional
detail in the following chapter.
150
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Finally, both hardware and software errors can also result in bridging loops. For example, faulty
hardware, such as a bad port, may result in data (packet) corruption or a high rate of errors on the
link, which may cause BPDUs to be lost, ultimately resulting in ports that should be blocking transitioning to a Forwarding state and causing a loop. Although uncommon, software bugs can also
cause STP protocol failures, which result in bridging loops. Because a bug is difficult to identify, you
should contact the TAC if you suspect a software bug.
Troubleshoo ng Spanning Tree
In the previous section, we discussed some of the reasons that can cause STP to fail and result in the
development of bridging loops in the network. Unlike routing loops, wherein packets are discarded
after the TTL decrements to 0, bridging loops can result in frames traversing the network infinitely.
As this happens, more and more switch resources are consumed, resulting in the switch finally
‘locking up.’ In such situations, in-band access, such as Telnet and SSH, to the switch is typically lost
and only out-of-band management via the console is possible. For this reason, as was stated earlier
in this chapter, it is important to ensure that the network is well designed and documented. This
allows you to identify the following:
•
The overall topology of the network
•
The location of the root bridge
•
Redundant links and blocking ports
This information allows you to troubleshoot the bridging loop effectively. Because a number of
problems could cause a bridging loop, there is no single method that can be used to troubleshoot
such issues. However, by understanding the common causes of bridging loops, you can use a process of elimination to isolate, and ultimately resolve, the issue. As previously stated, a bridging loop
will most likely result in you losing in-band access to the switch, meaning that only out-of-band
access and management via the console will be possible. If you know the location of the root bridge,
during periods of extended instability, the first thing you should do to reduce STP BPDU traffic
is to increase STP timers (i.e., the Forward Delay and Max Age) to the maximum possible values.
This allows you to stabilize the network somewhat and continue troubleshooting the bridging loop.
After this, you can perform the following activities to identify and isolate the issue:
•
Check port utilization statistics
•
Check port BPDU statistics
•
Check for duplex mismatches
•
Check port errors
•
Check resource utilization
151
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The best way to identify a bridging loop is to capture the traffic on a saturated link and check that
you see similar packets multiple times. An interface with traffic overload can fail to transmit vital
BPDUs. A link overload also indicates a possible bridging loop. You can use the show interfaces
command to verify link utilization statistics per port.
In addition to verifying utilization statistics, you should also verify that the ports that should be
receiving BPDUs are receiving them. BPDUs should be received on root ports and on blocked ports
(those on blocked ports are inferior). You can use the show spanning-tree interface [name]
detail command to verify BPDU statistics on a per-interface basis. You can also use the show
spanning-tree vlan [number] detail command to view interface statistics on a per-VLAN
basis.
When using these commands, you should see more received BPDUs on a root port that is in the
Forwarding state than BPDUs sent because BPDUs are propagated downstream from the root
bridge throughout the STP domain. Because all ports send BPDUs when they initialize, you should
see a few BPDUs sent out of the root port. Following is the output of the show spanning-tree
interface [name] detail command for a port elected as the root port:
Cat-3550-1#show spanning-tree interface FastEthernet0/11 detail
Port 11 (FastEthernet0/11) of VLAN0001 is root forwarding
Port path cost 19, Port priority 128, Port Identifier 128.11.
Designated root has priority 32769, address 000b.fd67.6500
Designated bridge has priority 32769, address 000b.fd67.6500
Designated port id is 128.11, designated path cost 0
Timers: message age 2, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 6, received 69600
Notice the significant difference between the number of sent and received BPDUs. This is what you
should expect to see during normal STP operation. Blocking ports should also see a large number
of BPDUs received, which is normal. While you will see some sent BPDUs, which is also normal,
this number should not be incrementing during normal STP operation. Additionally, you can look
at the number of transitions to the Forwarding state to troubleshoot instability issues. Following is
the output of the show spanning-tree interface [name] detail command for a port that has
been placed into the Blocking state:
Cat-3550-1#show spanning-tree interface FastEthernet0/12 detail
Port 12 (FastEthernet0/12) of VLAN0001 is alternate blocking
Port path cost 19, Port priority 128, Port Identifier 128.12.
Designated root has priority 32778, address 000b.fd67.6500
Designated bridge has priority 32778, address 000b.fd67.6500
Designated port id is 128.12, designated path cost 0
152
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Timers: message age 2, forward delay 0, hold 0
Number of transitions to forwarding state: 0
Link type is point-to-point by default
BPDU: sent 3, received 42804
Finally, on the root bridge itself, you should expect to see more BPDUs transmitted than are received on all ports. This indicates normal STP operation. Following is the output of the show spanning-tree vlan [number] detail command on the root of the specified VLAN:
Cat-3550-2#show spanning-tree vlan 10 detail
VLAN0010 is executing the ieee compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 10, address 000b.fd67.6500
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 4 last change occurred 23:27:07 ago
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 1, topology change 0, notification 0, aging 300
Port 11 (FastEthernet0/11) of VLAN0010 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.11.
Designated root has priority 32778, address 000b.fd67.6500
Designated bridge has priority 32778, address 000b.fd67.6500
Designated port id is 128.11, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 43024, received 3
Port 12 (FastEthernet0/12) of VLAN0010 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.12.
Designated root has priority 32778, address 000b.fd67.6500
Designated bridge has priority 32778, address 000b.fd67.6500
Designated port id is 128.12, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 43025, received 3
In addition to using show commands, you can also enable BPDU debugging using the debug spanning-tree bpdu command to verify BPDU statistics. However, keep in mind that the switch CPU
will most likely already be very high, and enabling debugging might just lock up the switch. The
following illustrates a sample of the output that is printed by this command:
Cat-3550-1#debug spanning-tree bpdu
*Mar
2 15:12:27.770: STP: Data
153
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
00000000008028000BFD676500000000008028000BFD676500800B0000140002000F00
*Mar 2 15:12:27.774: STP: VLAN0040 Fa0/11:0000 00 00 00 8028000BFD676500
00000000 8028000BFD676500 800B 0000 1400 0200 0F00
*Mar 2 15:12:27.774: STP(40) port Fa0/11 supersedes 0
*Mar 2 15:12:27.774: STP: VLAN0040 rx BPDU: config protocol = ieee, packet from
FastEthernet0/12 , linktype IEEE_SPANNING , enctype 2, encsize 17
*Mar 2 15:12:27.774: STP: enc 01 80 C2 00 00 00 00 0B FD 67 65 0C 00 26 42 42
03
*Mar 2 15:12:27.774: STP: Data
00000000008028000BFD676500000000008028000BFD676500800C0000140002000F00
*Mar 2 15:12:27.774: STP: VLAN0040 Fa0/12:0000 00 00 00 8028000BFD676500
00000000 8028000BFD676500 800C 0000 1400 0200 0F00
*Mar 2 15:12:27.778: STP(40) port Fa0/12 supersedes 0
*Mar 2 15:12:27.778: STP: VLAN0020 rx BPDU: config protocol = ieee, packet from
FastEthernet0/11 , linktype IEEE_SPANNING , enctype 2, encsize 17
...
[Truncated Output]
As previously stated, duplex mismatches can result in both VLAN connectivity as well as STP protocol failures. When the switch notices duplex mismatches, it will print the following:
%CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on FastEthernet0/1 (not full
duplex), with R2 FastEthernet0/0 (full duplex)
Interface or port errors could also be used to identify the root cause of the bridging loop. Using
the show interfaces command, check for incrementing errors, which could indicate packet corruption. Depending on the switch platform, you could also use additional commands, such as the
show controllers ethernet-controller [interface] command or the show interfaces
counters errors [interface] commands, to view error statistics.
Finally, also include resource utilization checks in your troubleshooting. As was previously stated,
high CPU utilization could be caused by a number of things. If possible, attempt to determine
which process is causing the high CPU utilization using the show processes cpu command.
Because this command prints a great deal of information, you should filter the output to increase
the efficiency of your troubleshooting process. For example, you could filter the output of this command so that all processes using no CPU resources are omitted as follows:
Cat-3550-1#show processes cpu | exclude 0.00
CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0%
PID Runtime(ms) Invoked uSecs 5Sec
1Min
5Min TTY Process
51
581000
423131
1373 0.39% 0.39% 0.39%
0 Vegas Statistics
71
708
730
969 0.15% 0.04% 0.01%
0 Exec
154
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
USING THE ‘SHOW SPANNINGͳTREE’ COMMAND
In the final section of this chapter, we will explore the Cisco IOS show spanning-tree command,
as well as the keywords that can be used in conjunction with this command. We will also discuss
the relevant output that is printed by this command and how to interpret this information when
validating or troubleshooting Spanning Tree Protocol. The following output displays the keywords
that can be used in conjunction with this command:
Cat-3550-1#show spanning-tree ?
WORD
bridge group list, example 1,3-5,7,9
active
Report on active interfaces only
backbonefast
Show spanning tree backbonefast status
blockedports
Show blocked ports
bridge
Status and configuration of this bridge
detail
Detailed information
inconsistentports Show inconsistent ports
interface
Spanning Tree interface status and configuration
mst
Multiple spanning trees
pathcost
Show Spanning pathcost options
root
Status and configuration of the root bridge
summary
Summary of port states
uplinkfast
Show spanning tree uplinkfast status
vlan
VLAN Switch Spanning Trees
|
Output modifiers
<cr>
NOTE: Only keywords that are applicable to the TSHOOT certification exam will be discussed.
The active keyword prints information on active VLANs in the STP domain. This includes the
STP timers, priority of the root bridge, active interfaces or ports, and their states. The output of this
command is illustrated below:
Cat-3550-1#show spanning-tree active
VLAN0001
Spanning tree enabled protocol ieee
Root ID
Priority
1
Address
000f.2303.2d80
This bridge is the root
Hello Time
2 sec Max Age 20 sec
Bridge ID
Forward Delay 15 sec
Priority
1
(priority 0 sys-id-ext 1)
Address
000f.2303.2d80
Hello Time
2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300
Interface
Role Sts Cost
Prio.Nbr Type
------------------- ---- --- --------- -------- ---------------------------Fa0/11
Desg FWD 19
128.11
P2p
155
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Fa0/12
Desg FWD 19
128.12
VLAN0010
Spanning tree enabled protocol ieee
Root ID
Priority
32778
Address
000b.fd67.6500
Cost
19
Port
11 (FastEthernet0/11)
Hello Time
2 sec Max Age 20 sec
Bridge ID
P2p
Forward Delay 15 sec
Priority
32778 (priority 32768 sys-id-ext 10)
Address
000f.2303.2d80
Hello Time
2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300
Interface
------------------Fa0/11
Fa0/12
Role
---Root
Altn
Sts
--FWD
BLK
Cost
--------19
19
Prio.Nbr
-------128.11
128.12
Type
--------------------------P2p
P2p
The blockedports keyword prints blocked ports on a per-VLAN basis. If you know which ports
should be in a Blocking state, then you can use this command to verify that they are indeed blocking. The following output illustrates the information that is printed by this command:
Cat-3550-1#show spanning-tree blockedports
Name
-------------------VLAN0010
VLAN0020
VLAN0030
VLAN0040
Blocked Interfaces List
-----------------------------------Fa0/12
Fa0/12
Fa0/12
Fa0/12
Number of blocked ports (segments) in the system : 4
The detail keyword prints detailed STP information on a per-VLAN basis. This information includes information on STP timers and port states, among other information. Following is a sample
output of the information that is printed by this command:
Cat-3550-1#show spanning-tree detail
VLAN0001 is executing the ieee compatible Spanning Tree Protocol
Bridge Identifier has priority 0, sysid 1, address 000f.2303.2d80
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 5 last change occurred 01:03:37 ago
156
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
Times:
hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 1, topology change 0, notification 0, aging 300
Port 11 (FastEthernet0/11) of VLAN0001 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.11.
Designated root has priority 1, address 000f.2303.2d80
Designated bridge has priority 1, address 000f.2303.2d80
Designated port id is 128.11, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 1916, received 69809
Port 12 (FastEthernet0/12) of VLAN0001 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.12.
Designated root has priority 1, address 000f.2303.2d80
Designated bridge has priority 1, address 000f.2303.2d80
Designated port id is 128.12, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 1914, received 69809
...
[Truncated Output]
The inconsistentports keyword prints information about ports that have an inconsistent STP
state. Cisco IOS software places ports into an inconsistent state so that any misconfigurations do
not impact STP and result in a protocol failure, which can ultimately lead to a network outage. Valid
reasons for which a port will be placed into an inconsistent state include the following:
•
Loop inconsistency
•
Port VLAN ID (PVID) inconsistency
•
Root inconsistency
•
EtherChannel inconsistency
•
Type inconsistency
The port is placed into the loop inconsistent state if Loop Guard detects that a non-designated
port has stopped receiving BPDUs. This is used to prevent the port from transitioning from nondesignated (Blocking) to designated (Forwarding) in the absence of BPDUs. A switch port will be
placed into a PVID inconsistent state if a PVST+ BPDU is received on a different VLAN than the
BPDU was originated. This happens when native VLANs are mismatched.
157
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The Root Guard feature prevents a designated port from becoming a root port. If the port on which
the Root Guard feature receives a superior BPDU, then it moves the port into a root inconsistent
state, thus maintaining the current root bridge status quo. The EtherChannel inconsistent state is
used to prevent loops in the event of any EtherChannel misconfigurations. EtherChannel Guard is
enabled on switches using the spanning-tree etherchannel guard misconfig global configuration command. Finally, a port is placed into a type inconsistent state if a PVST+ BPDU is received
on a non-802.1Q trunk. The output of the show spanning-tree inconsistentports command
following an error message indicating a native VLAN mismatch on the switch is illustrated below:
*Mar 2 16:33:22.709: %SPANTREE-7-RECV_1Q_NON_TRUNK: Received 802.1Q BPDU on non
trunk FastEthernet0/11 VLAN1.
*Mar 2 16:33:22.709: %SPANTREE-7-BLOCK_PORT_TYPE: Blocking FastEthernet0/11 on
VLAN0001. Inconsistent port type.
Cat-3550-2#
Cat-3550-2#show spanning-tree inconsistentports
Name
Interface
Inconsistency
-------------------- ------------------------ -----------------VLAN0001
FastEthernet0/11
Port Type Inconsistent
Number of inconsistent ports (segments) in the system : 1
The interface keyword prints STP information on a per-interface basis, which includes the port
role and state, port cost, and priority, as well as the port type. You can append the detail keyword
to view additional information, such as the root bridge ID and BPDU statistics. The following output displays the information printed by this command:
Cat-3550-1#show spanning-tree interface FastEthernet0/11
Vlan
------------------VLAN0001
VLAN0010
VLAN0020
VLAN0030
VLAN0040
Role
---Desg
Root
Root
Root
Root
Sts
--FWD
FWD
FWD
FWD
FWD
Cost
--------19
19
19
19
19
Prio.Nbr
-------128.11
128.11
128.11
128.11
128.11
Type
--------------------------P2p
P2p
P2p
P2p
P2p
The root keyword prints information about the root bridge for all active VLANs. This includes the
root bridge ID, root priority, root port cost, timers, and the root port as illustrated below:
Cat-3550-1#show spanning-tree root
Root
Hello Max Fwd
Vlan
Root ID
Cost
Time Age Dly
---------------- -------------------- --------- ----- --- --VLAN0001
1 000f.2303.2d80
0
2
20 15
VLAN0010
32778 000b.fd67.6500
19
2
20 15
158
Root Port
-----------Fa0/11
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
VLAN0020
VLAN0030
VLAN0040
32788 000b.fd67.6500
32798 000b.fd67.6500
32808 000b.fd67.6500
19
19
19
2
2
2
20
20
20
15
15
15
Fa0/11
Fa0/11
Fa0/11
The summary keyword prints summary information for all VLANs, including STP mode, STP enhancements (e.g., PortFast and Loop Guard configuration), the VLANs for which the local switch is
root bridge, and the state of different ports in different VLANs. Following is the output of the show
spanning-tree summary command:
Cat-3550-2#show spanning-tree summary
Switch is in pvst mode
Root bridge for: VLAN0010, VLAN0020, VLAN0030, VLAN0040
Extended system ID
is enabled
Portfast Default
is disabled
PortFast BPDU Guard Default is disabled
Portfast BPDU Filter Default is disabled
Loopguard Default
is disabled
EtherChannel misconfig guard is enabled
UplinkFast
is disabled
BackboneFast
is disabled
Configured Pathcost method used is short
Name
Blocking Listening Learning Forwarding STP Active
---------------------- -------- --------- -------- ---------- ---------VLAN0001
1
0
0
1
2
VLAN0010
0
0
0
2
2
VLAN0020
0
0
0
2
2
VLAN0030
0
0
0
2
2
VLAN0040
0
0
0
2
2
---------------------- -------- --------- -------- ---------- ---------5 vlans
1
0
0
9
10
Finally, the vlan keyword prints detailed information on a per-VLAN basis. This is one of the most
commonly used STP commands. The information printed by this command includes STP timers
set on the root bridge, local STP timers, and information on the root bridge, among other things.
Following is the output of this command on a non-root bridge:
Cat-3550-1#show spanning-tree vlan 10
VLAN0010
Spanning tree enabled protocol ieee
Root ID
Priority
32778
Address
000b.fd67.6500
Cost
19
Port
11 (FastEthernet0/11)
Hello Time
2 sec Max Age 20 sec
Bridge ID
Priority
32778
Forward Delay 15 sec
(priority 32768 sys-id-ext 10)
159
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Address
000f.2303.2d80
Hello Time
2 sec Max Age 20 sec
Aging Time 300
Interface
------------------Fa0/11
Fa0/12
Role
---Root
Altn
Sts
--FWD
BLK
Cost
--------19
19
Prio.Nbr
-------128.11
128.12
Forward Delay 15 sec
Type
--------------------------P2p
P2p
Following is the output of the same command on the root bridge of the specified VLAN:
Cat-3550-2#show spanning-tree vlan 10
VLAN0010
Spanning tree enabled protocol ieee
Root ID
Priority
32778
Address
000b.fd67.6500
This bridge is the root
Hello Time
2 sec Max Age 20 sec
Bridge ID
Forward Delay 15 sec
Priority
32778 (priority 32768 sys-id-ext 10)
Address
000b.fd67.6500
Hello Time
2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300
Interface
------------------Fa0/11
Fa0/12
Role
---Desg
Desg
Sts
--FWD
FWD
Cost
--------19
19
Prio.Nbr
-------128.11
128.12
Type
--------------------------P2p
P2p
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Troubleshoo ng at the Physical Layer
•
If you have physical access to the switch, the LEDs can be a very useful troubleshooting tool
•
Cisco switches have front panel LEDs that can be used to determine link and system status
•
A link or port LED color other than green typically indicates some kind of failure
•
However, a green link light does not always mean that the network cable is fully functional
•
The show interfaces command is a powerful troubleshooting tool that provides the following:
1. The administrative status of a switching port
2. The port operational state
3. The media type (for select switches and ports)
4. Port input and output packets
5. Port buffer failures and port errors
160
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
6. Port input and output errors
7. Port input and output queue drops
VLAN, VTP, and Trunking Overview
•
A VLAN is a logical grouping of hosts that appear to reside on the same physical LAN
•
VLANs increase Broadcast domains but reduce their size
•
A VLAN can span a single or multiple switches, depending on the implementation
•
Catalyst switches support two types of switch VLAN ports: access and trunk ports
•
Access ports are switch ports that are assigned to a single VLAN
•
Frames sent across access ports are untagged
•
Trunk links or ports are used to carry multiple VLANs
•
Trunk links can use ISL or 802.1Q encapsulation
•
Frames sent across trunk links are tagged or colored to identify the VLAN they belong to
•
VTP manages the addition, deletion, and renaming of VLANs
•
VTP allows VLAN information to propagate through the switched network
•
A switch can only belong to one VTP domain at any one time
•
There are three VTP modes for switches: server (default), client, and transparent
Troubleshoo ng VLANs
•
Some of the more common causes of intra-VLAN connectivity issues include the following:
1. Duplex Mismatches
2. Bad NIC or Cable
3. Congestion
4. Hardware Issues
5. Software Issues
6. Resource Oversubscription
7. Configuration Issues
•
Some common causes for switches not receiving VLAN information include the following:
1. Layer 2 Trunking Misconfigurations
2. Incorrect VTP Configuration
3. Configuration Revision Number
4. Physical Layer Issues
5. Software or Hardware Issues or Bugs
6. Switch Performance Issues
•
Possible reasons for a loss of end-to-end connectivity within a VLAN include the following:
1. Physical Layer Issues
161
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
2. VTP Pruning
3. VLAN Trunk Filtering
4. New Switches
5. Switch Performance Issues
6. Network Congestion
7. Software or Hardware
Spanning Tree Protocol Overview
•
The Spanning Tree Protocol (STP) is defined in the IEEE 802.1D standard
•
STP attempts to provide a loop free topology in redundant networks
•
STP operates by making the following assumptions about the network:
1. All links are bidirectional and can both send and receive BPDUs
2. The switch is able to regularly receive, process and send BPDUs
•
The topology of an active switched network is determined by the following three variables:
1. The unique MAC address (switch identifier) that is associated with each switch
2. The path cost to the root bridge associated with each switch port
3. The port identifier (MAC address of the port) associated with each switch port
•
Switches determine the best Configuration BPDU based on the following four factors:
1. Lowest root bridge ID
2. Lowest root path Cost to root bridge
3. Lowest Sender Bridge ID
4. Lowest Sender Port ID
•
The completion of the Configuration BPDU exchange results in the following actions:
1. A Root Switch is elected for the entire Spanning Tree domain
2. A root port is elected on every Non-Root Switch in the Spanning Tree domain
3. A Designated Switch is elected for every LAN segment
4. A Designated Port is elected on the Designated Switch for every segment
5. Loops in the network are eliminated by blocking redundant paths
•
STP supports Configuration BPDUs, TCN BDPUs and TCA BPDUs
•
Configuration BPDUs are sent by the root bridge
•
TCN BPDUs are sent by any switch
•
TCA BPDUs are used by the root bridge to acknowledge TCN BPDUs
•
STP uses port cost and port priority to determine the root port
•
The port cost affects the entire STP topology
162
C H A P T E R 3: T RO U B L ES H O OT I N G SW I TC H ES AT L AY E RS 1 A N D 2
•
The port priority is a locally significant parameter
•
The three configurable Spanning Tree timer values are as follows:
1. The Hello Time
2. The Forward Delay
3. The Max Age
Troubleshoo ng Spanning Tree Protocol
•
Elements that should be taken into consideration when designing a solid STP network are as
follows:
1. Determine the Location of the root bridge
2. Design a Deterministic Topology
3. Integrate Layer 3 Switching
4. Avoid End-to-End VLAN Solutions
•
Common reasons include, but are not limited to, the following:
1. Physical layer connectivity issues
2. Switch misconfigurations
3. Switch resource utilization issues
4. Broadcast storms
5. Hardware or software errors
•
During periods of extended instability, increase the Max Age and Fwd Delay timers
•
After this, you can perform the following activities to identify and isolate the issue:
1. Check Port Utilization Statistics
2. Check Port BPDU Statistics
3. Check for Duplex Mismatches
4. Check Port Errors
5. Check Resource Utilization
163
CHAPTER 4
Troubleshoo ng Catalyst
Switch Layer 3 Protocols,
Supervisor Redundancy, and
Performance Issues
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
I
n the previous chapter, we discussed troubleshooting Layer 1 and Layer 2 issues on Cisco IOS
Catalyst switches. This chapter discusses Catalyst switch Layer 3 troubleshooting, including basic
routing operation and functionality on Catalyst switches and First Hop Redundancy Protocols. In
addition, this chapter will also describe Catalyst Switch Supervisor Redundancy and performance
troubleshooting. The TSHOOT certification exam objectives that are covered in this chapter include the following:
•
Troubleshoot First Hop Redundancy Protocols
•
Troubleshoot Switch Virtual Interfaces
•
Troubleshoot Switch Supervisor Redundancy
This chapter will be divided into the following sections:
•
Catalyst Switch VLAN Interfaces Overview
•
Catalyst Switch MLS Overview
•
Troubleshooting Multilayer Switching
•
Understanding and Troubleshooting HSRP
•
Understanding and Troubleshooting VRRP
•
Understanding and Troubleshooting GLBP
•
Troubleshooting Switch Supervisor Redundancy
•
Troubleshooting Switch Performance Issues
CATALYST SWITCH VLAN INTERFACES OVERVIEW
In a switched network, virtual LANs (VLANs) separate devices into different collision domains.
Additionally, VLANs are also used to separate devices into different subnets. Devices within a
VLAN can communicate with each other without the need for routing, assuming that these devices
reside within the same subnet. However, devices in separate VLANs require a routing device in
order to communicate with one another. Traditionally, IP routing and any Layer 3 functions, such
as LAN default gateway functionality, were implemented primarily on routers. This entailed using
multiple physical router interfaces (in different VLANs) to provide gateway and routing functionality between different VLANs, or using a single physical router interface and then creating multiple
subinterfaces, each serving as the default gateway for the specified VLAN, thus allowing devices in
different VLANs to communicate.
However, in modern networks, these functions are performed by Multilayer Switching (MLS),
which is described in additional detail later in this chapter. MLS supports the configuration of
Switch Virtual Interfaces (SVIs), which represent VLANs and allow the switch to serve as the de-
166
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
fault gateway for the VLAN. Although the SVI represents a VLAN, it is not automatically configured when a Layer 2 VLAN is configured on the switch.
Likewise, the switch will not automatically create a VLAN if you configure an SVI. The only exception to this rule is the SVI for VLAN 1, which is the default VLAN. This SVI is automatically created
by the software to allow for remote administration of the switch; however, it defaults to an administratively shutdown state and must be brought up manually and configured with Layer 3 addressing.
Additionally, the switch should be configured with the correct default gateway, or on Multilayer
switches, IP routing should be enabled.
A Switch Virtual Interface is a very resilient interface. In order for an SVI to be placed into the up/
up state, the following conditions must be met:
•
The VLAN exists and is active in the VLAN database of the switch
•
The VLAN interface is not administratively down
•
At least one Layer 2 port (access or trunk) exists and has a link up on this VLAN
•
At least one Layer 2 port (access or trunk) is in the STP Forwarding state
In addition to SVIs, which provide default gateway functions for VLANs, Multilayer switches also
support IP addressing configuration on physical interfaces. However, it is important to remember
that switches do not route non-IP packets between VLANs and routed ports. However, you can
forward these non-IP packets using fallback bridging, which is described in detail in the current
SWITCH guide that is available online.
CATALYST SWITCH MLS OVERVIEW
Multilayer Switching (MLS) combines Layer 2, Layer 3, and Layer 4 switching technologies to forward packets at wire speed using hardware. Cisco supports MLS for both Unicast and Multicast
traffic flows. In Unicast transmission, a flow is a unidirectional sequence of packets between a
source and destination pair that shares the same protocol and Transport Layer information. These
flows are based only on Layer 3 address information. In Multicast transmission, a flow is a unidirectional sequence of packets between a Multicast source and the members of a destination Multicast
group. Multicast flows are based on the IP address of the source device and the destination IP
Multicast group address.
In MLS, a Layer 3 switching table, referred to as an MLS cache, is maintained for the Layer
3-switched flows. The MLS cache maintains flow information for all active flows and includes entries for traffic statistics that are updated in tandem with the switching of packets. After the MLS
167
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
cache is created, any packets identified as belonging to an existing flow can be Layer 3-switched
based on the cached information.
MLS integrates both data plane and control plane functions. These two planes are responsible for
the building of routing tables and the actual forwarding of packets. The control plane is where routing information, routing protocol updates, and other control information is stored and exchanged.
Using routing protocols, the control plane is responsible for updating the routing table as changes
in the network topology occur. The data plane is responsible for the actual forwarding of data. The
data plane is typically populated using information derived from the control plane. This plane is
used to determine the physical next-hop egress interface for received packets or frames and then
forwards the packets or frames using the correct egress interface.
MLS is enabled by configuring Cisco Express Forwarding (CEF) on the switch. CEF operates at the
data plane and is a topology-driven proprietary switching mechanism that creates a forwarding
table that is tied to the routing table (i.e., the control plane). CEF was developed to eliminate the
performance penalty experienced due to the first-packet process-switched lookup method used by
flow-based switching.
CEF eliminates this by allowing the route cache used by the hardware-based Layer 3 routing engine
to contain all the necessary information to the Layer 3 switch in hardware before any packets associated with a flow are even received. Information that is conventionally stored in a route cache is
stored in two data structures for CEF switching. These data structures provide optimized lookup
for efficient packet forwarding and are referred to as the Forwarding Information Base (FIB) and
adjacency table, as described in the following section.
CEF uses a FIB to make IP destination prefix-based switching decisions. The FIB is conceptually
similar to a routing table or an information base. It maintains a mirror image of the forwarding
information contained in the IP routing table. In other words, the FIB contains all IP prefixes from
the routing table. When routing or topology changes occur in the network, the IP routing table is
updated, and those changes are also reflected in the FIB. The FIB maintains next-hop address information based on the information in the IP routing table. Because there is a one-to-one correlation
between FIB entries and routing table entries, the FIB contains all known routes and eliminates the
need for route cache maintenance that is associated with switching paths, such as fast switching
and optimum switching.
Additionally, because the FIB lookup table contains all known routes that exist in the routing table,
it eliminates route cache maintenance and the fast switching and process switching forwarding
scenarios. This allows CEF to switch traffic more efficiently than typical demand caching schemes.
168
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
The adjacency table is created in order to contain all connected next-hops. An adjacent node is a
node that is one hop away (i.e., directly connected). The adjacency table is populated as adjacencies
are discovered. As soon as a neighbor becomes adjacent, a Data Link Layer header, called a MAC
string or a MAC rewrite, which will be used to reach that neighbor, is created and stored in the
table. On Ethernet segments, the header information is the destination MAC address, the source
MAC address, and the EtherType, in that specific order.
As soon as a route is resolved, it points to an adjacent next-hop. If an adjacency is found in the adjacency table, a pointer to the appropriate adjacency is cached in the FIB element. If multiple paths
exist for the same destination, then a pointer to each adjacency is added to the load-sharing structure, which allows for load-balancing. When prefixes are added to the FIB, prefixes that require
exception handling are cached with special adjacencies. These components, and their interaction,
are illustrated in Figure 4-1 below.
MSFC - Control Plane
IP Routing Table
IP ARP Table
Protocol Metric Network/Mask
Next-Hop
IP Address MAC Address
Static
192.168.1.1
192.168.1.1 0001.1a2b.cdef
1
172.16.0.0/12
Network/Mask
Adjacency
IP Address
MAC Address
172.16.0.0/12
192.168.1.1
192.168.1.1
0001.1a2b.cdef
Forwarding Information Base
Adjacency Table
PFC - Data Plane
Fig. 4-1. Cisco Express Forwarding Operation
Enabling CEF requires the use of a single command, which is the ip cef [distributed] global
configuration command. The [distributed] keyword is applicable only to high-end switches,
such as the Catalyst 6500 Series switch, that support distributed CEF. MSFC and PFC are explained
later in this chapter.
169
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
TROUBLESHOOTING MULTILAYER SWITCHING
MLS troubleshooting requires troubleshooting at both the control plane and the data plane. The
control plane troubleshooting is the same as that performed on routers. MLS troubleshooting
should follow a systematic approach, which involves checking the control plane (i.e., routing information), and then verifying the data or forwarding plane information. The following basic steps
should be taken when troubleshooting Unicast MLS issues:
1. Verify that IP routing information for the address is correct
2. Verify that the next-hop has a valid MAC address
3. Verify that the FIB next-hop is the same as the RIB next-hop
4. Verify the CEF adjacency table rewrite information
5. Verify FIB and adjacency table population in TCAM
The first step is to verify that the destination address is present in the routing table. This step is
performed using the show ip route command as follows:
Cat-6500-1#show ip route 0.0.0.0 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
Known via “static”, distance 1, metric 0, candidate default path
Routing Descriptor Blocks:
* 10.10.10.1
Route metric is 0, traffic share count is 1
This first step is used to ensure that there is a route to the intended destination network and the
route has a valid next-hop address. If the route does not exist or the next-hop address is incorrect,
troubleshooting of routing protocol, next-hop interfaces, or route configuration will be required.
The second step is to verify that the next-hop address has a valid next-hop MAC address using the
show ip arp command. If the ARP entry for the next-hop address is incomplete, then you will
need to troubleshoot the ARP. Following is the output of the show ip arp command for the nexthop address shown in the previous output:
Cat-6500-1#show ip arp | include 10.105.30.2
Internet
10.10.10.1
2
000f.20da.833d
ARPA
Vlan4
If the MAC address is incorrect, then you will need to verify whether another device owns that IP
address. You can determine the MAC address of the next-hop device using the show interfaces
command if it is a Cisco IOS device. In Cisco Catalyst 6500 Series switches, you can use the Layer
2 traceroute utility, which is invoked using the traceroute mac <source mac> <destination
mac> privileged EXEC command.
170
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
If the ARP entry is incomplete, it means that you did not get any replies from that host. In that
case, you need to verify that the host is up and running. Following this, continue troubleshooting
at the data plane to ensure that the same information is also present. Validate that the route entry
in the FIB contains the same next-hop address as in the first step by using the show ip cef command. If there is a discrepancy between the two, a routing loop can be created. The output of this
command for the same prefix displays the following entries:
Cat-6500-1#show ip cef 0.0.0.0 0.0.0.0 detail
0.0.0.0/0, version 38, epoch 0, cached adjacency 10.10.10.1
0 packets, 0 bytes
Flow: AS 0, mask 0
via 10.10.10.1, 0 dependencies, recursive
next hop 10.10.10.1, Vlan4 via 10.10.10.1/32 (Default)
valid cached adjacency
Finally, verify that the CEF adjacency table contains the same rewrite information as the ARP table
from Step 2 by using the show adjacency detail command as follows:
Cat-6500-1#show adjacency detail | begin 10.10.10.1
IP
Vlan4
10.10.10.1(7)
7834810780 packets, 1413540065564 bytes
000F20DA833D
000E39E29C000800
ARP
00:54:07
Epoch: 0
After performing these checks and correcting any identified issues, if you are still experiencing
routing issues, you will need to verify the population of the FIB and adjacency table in Ternary Content Addressable Memory (TCAM) using the show mls cef commands. TCAM troubleshooting should be performed under the supervision of a Technical Assistance Center (TAC) engineer.
TCAM is similar to CAM but allows information to be looked up much faster. TCAM is described
in the SWITCH guide, which is available online at www.howtonetwork.net.
UNDERSTANDING AND TROUBLESHOOTING HSRP
As is the case with all protocols and technologies, in order to troubleshoot something, you must
have a solid understanding of how it works. In this section, we will revisit Hot Standby Router
Protocol (HSRP) fundamentals, reinforcing the material covered in the SWITCH exam, and then
conclude the section by discussing some common HSRP problems and ways to troubleshoot them.
171
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Hot Standby Router Protocol Overview
Hot Standby Router Protocol is a Cisco-proprietary First Hop Redundancy Protocol that allows
physical gateways that are configured as part of the same HSRP group to share the same virtual
gateway address. Network hosts residing on the same subnet as the gateways are configured with
the virtual gateway IP address as their default gateway. Multiple HSRP groups can be configured
under the same interface, allowing for load-sharing of traffic between the gateways.
While operational, the primary gateway forwards packets destined to the virtual gateway IP address of the HSRP group. In the event that the primary gateway fails, the secondary gateway assumes the role of primary and forwards all packets sent to the virtual gateway IP address. However,
it is important to remember that this happens only if the secondary gateway is configured to preempt. If preemption is not configured, then the secondary gateway will never assume the role of
the active gateway.
Cisco IOS software supports two versions of HSRP: version 1 and version 2. By default, when Hot
Standby Router Protocol is enabled in Cisco IOS software, version 1 is enabled. The following section lists and describes the operational differences between these two versions:
•
HSRPv1 restricts the number of configurable HSRP groups to 255, whereas version 2 numbers have been extended from 0 to 4095.
•
HSRPv1 routers communicate by sending messages to Multicast group address 224.0.0.2 using UDP port 1985. HSRPv2 routers communicate by sending messages to Multicast group
address 224.0.0.102 using UDP port 1985.
•
The version 2 packet format uses a Type/Length/Value (TLV) format. HSRP version 2
packets received by an HSRPv1 router will have the Type field mapped to the Version field
by HSRPv1 and will subsequently be ignored.
•
Although HSRPv1 advertises timer values, these values are always to the whole second, as it
is not capable of advertising or learning millisecond timer values. HSRPv2 is capable of both
advertising and learning millisecond timer values.
•
Version 2 provides improved management and troubleshooting by including a 6-byte
Identifier field, which is populated with the physical router interface MAC address and
is used to identify uniquely the source of HSRP active Hello messages. In version 1, these
messages contain the virtual MAC address as the source MAC, which means it is not possible to determine which HSRP router actually sent the HSRP Hello message.
•
In HSRPv1, the Layer 2 address that is used by the virtual IP address will be a virtual MAC
address composed of 0000.0C07.ACxx, where xx is the HSRP group number in Hexadecimal value and is based on the respective interface. HSRPv2, however, uses a new MAC
address range of 0000.0C9F.F000 to 0000.0C9F.FFFF for the virtual gateway IP address.
172
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
Troubleshoo ng Hot Standby Router Protocol
The majority of HSRP issues are due to router and switch misconfigurations. For this reason, it is
important to have an intimate understanding of the operation of this protocol to be able to identify
any HSRP issues quickly. In addition to misconfigurations, device resource utilization and Physical
layer and Data Link layer issues can also affect the operation of HSRP. While it is not possible to
delve into specifics on all possible HSRP issues, the common HSRP problem scenarios include the
following:
•
Gateway logging continuous HSRP state changes
•
HSRP gateways not reflecting the correct state
•
HSRP does not detect peer router
•
HSRP causes a MAC violation on a secure switch port
The following section describes the common HSRP problem scenarios and provides recommended
solutions for them.
One of the most common problems experienced in networks running HSRP is continuous state
changes. If you recall, from the SWITCH guide, we learned that when HSRP is enabled on an interface, the gateway interface goes through a series of states as follows:
1. Disabled
2. Init
3. Listen
4. Speak
5. Standby
6. Active
The most common messages seen after HSRP has been configured are transitions between the active and the standby states. During the speak phase, the standby gateway exchanges messages with
the active gateway. Upon completion of this phase, the primary gateway transitions to the active
state and the backup gateway transitions to the standby state. The standby state indicates that the
gateway is ready to assume the role of active gateway if the primary gateway fails, and the active
state indicates that the gateway is ready to forward packets actively. Continuous state transitions
between active and standby result in the following log messages:
%HSRP-5-STATECHANGE:
%HSRP-5-STATECHANGE:
%HSRP-5-STATECHANGE:
%HSRP-5-STATECHANGE:
Vlan1
Vlan1
Vlan1
Vlan1
Grp
Grp
Grp
Grp
1
1
1
1
state
state
state
state
Speak -> Standby
Standby -> Active
Active -> Speak
Speak -> Standby
173
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
%HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active
%HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak
%HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby
Such error messages indicate a situation in which a standby HSRP router did not receive three
successive HSRP Hello packets from its HSRP peer and therefore assumes the role of active router.
There are several possible causes for the loss of HSRP packets between the peers. The most common reasons for these errors include the following:
•
Physical layer problems
•
Data Link layer problems
•
Excessive traffic
•
Gateway resource issues
Physical layer issues (e.g., flapping interfaces, errors, etc.) may prevent HSRP packets from being
sent successfully between the peers. You can look for link transitions by filtering log output to
include only link up/down messages and then troubleshooting the identified links using the show
interfaces command, as was described earlier in this guide. For example, you can check the logs
on the local gateway and if there is nothing there to indicate link failure, then you can proceed and
check the logs of the peer. Keep in mind that the devices might be connected to different switches,
so it is important to verify all links in the path between the gateways.
Layer 2 problems, such as Spanning Tree issues, can cause a loss of HSRP messages between the
peers. STP loops can cause Broadcast storms, duplicated frames, and MAC table inconsistency.
All of these problems affect the entire network, especially HSRP. HSRP error messages can be the
first indication of an STP issue. As stated in the previous chapter, when troubleshooting STP, it is
important to understand the network topology, which includes the location of the root bridge, the
backup root bridge (if applicable), blocking or redundant ports, and forwarding ports.
Finally, HSRP state changes are often due to high CPU utilization on the gateway. You can use the
show processes cpu command to troubleshoot and identify CPU utilization issues. Filter the
output of this command to show processes that are utilizing the CPU. If the issue is a result of high
CPU utilization, troubleshoot the problem using the appropriate tools, which may include putting
a sniffer on the network and then tracing the system that is causing this utilization.
A commonly encountered issue following HSRP implementation is that the standby gateway never
becomes the active gateway following the primary gateway failure, or that the primary gateway
never re-assumes its role after it is back online. These issues are almost always attributed to device
misconfigurations. As stated earlier in the previous section, by default, HSRP does not preempt.
174
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
This means that even if the standby HSRP priority is higher than that of the primary, if preemption
is not configured, it will never become active.
Similarly, if the active gateway fails and is restored (if preemption is not configured), it will not reassume its previous role, even though it has a higher priority than the standby gateway.
Consider the following output, which illustrates the output of the show standby command:
Cat-6500-1#show standby vlan 1
Vlan1 - Group 1
Local state is Active, priority 105, may preempt
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 1.541
Virtual IP address is 10.0.0.254 configured
Active router is local
Standby router is 10.0.0.2 expires in 8.772
Virtual mac address is 0000.0c07.ac01
1 state changes, last state change 00:58:15
IP redundancy name is “hsrp-Vl1-1” (default)
From the output above, we can determine that Cat-6500-1 is the active gateway for HSRP Group
1. This switch has been configured with a priority of 105 for the group, and it is also configured to
preempt. In the event that the switch or SVI fails and is restored, it will become the active gateway
again because it is configured to preempt. Continuing with the example, the following displays the
output of the show standby command on the current standby gateway:
Cat-6500-2#show standby vlan 1
Vlan1 - Group 1
Local state is Standby, priority 100
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 0.585
Virtual IP address is 10.0.0.254 configured
Active router is 10.0.0.1, priority 105 expires in 9.948
Standby router is local
1 state changes, last state change 00:58:28
IP redundancy name is “hsrp-Vl1-1” (default)
From an outright failure standpoint (e.g., if the entire switch fails or the switch SVI goes down), the
configuration will work because if the active gateway is no longer present, then the standby gateway
simply becomes the active gateway. No preemption is required in such situations. The issue arises
when the active gateway does not fail but instead decrements its priority based on additional configuration, such as interface tracking. Assume, for example, that the configuration on the gateway
is modified to include tracking as follows:
Cat-6500-1#show standby vlan 1
Vlan1 - Group 1
175
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Local state is Active, priority 105, may preempt
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 1.541
Virtual IP address is 10.0.0.254 configured
Active router is local
Standby router is 10.0.0.2 expires in 8.772
Virtual mac address is 0000.0c07.ac01
1 state changes, last state change 00:05:15
IP redundancy name is “hsrp-Vl1-1” (default)
Priority tracking 1 interface or object, 1 up:
Interface or object
Decrement State
GigabitEthernet0/1
10
Up
After this configuration, the GigabitEthernet0/1 interface on the active gateway is disabled. The
show standby command on this device now displays the following:
Cat-6500-1#show standby vlan 1
Vlan1 - Group 1
Local state is Active, priority 95 (confgd 105), may preempt
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 1.541
Virtual IP address is 10.0.0.254 configured
Active router is local
Standby router is 10.0.0.2 expires in 7.704 sec
Virtual mac address is 0000.0c07.ac01
1 state changes, last state change 00:08:35
IP redundancy name is “hsrp-Vl1-1” (default)
Priority tracking 1 interface or object, 0 up:
Interface or object
Decrement State
GigabitEthernet0/1
10
Down (administratively down)
After this, the show standby command on the standby router now displays the following:
Cat-6500-2#show standby vlan 1
Vlan1 - Group 1
Local state is Standby, priority 100
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 0.585
Virtual IP address is 10.0.0.254 configured
Active router is 10.0.0.1, priority 95 expires in 8.476 sec
Standby router is local
1 state changes, last state change 00:01:28
IP redundancy name is “hsrp-Vl1-1” (default)
Notice that even with the high priority of 100 versus that on the active router, which is 95, after the
tracking configuration decremented the configured priority by 10 (default), the second gateway
does not become the active gateway for the group because it has not been configured to preempt.
This is a common mistake that is made when configuring HSRP.
176
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
NOTE: When configuring HSRP, ensure that preemption is configured on both the primary
and the standby gateways. You can verify whether preemption is configured by looking for
coup messages in the output of the debug standby command as illustrated in the following
output:
Cat-3550-2#debug standby
HSRP debugging is on
Nov 4 02:29:14.234: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP
10.0.0.254
Nov 4 02:29:17.234: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP
10.0.0.254
Nov 4 02:29:20.234: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Active pri 100 vIP
10.0.0.254
Nov 4 02:29:20.238: HSRP: Vl1 API active virtual MAC 0000.0c07.ac01 found
Nov 4 02:29:20.238: HSRP: Vl1 API active virtual MAC 0000.0c07.ac01 found
Nov 4 02:29:20.238: HSRP: Vl1 REDIRECT adv in, Passive, active 0, passive 1,
from 10.0.0.1
Nov 4 02:29:20.238: HSRP: Vl1 REDIRECT adv in, Active, active 1, passive 2,
from 10.0.0.1
Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Coup
in 10.0.0.1 Listen pri 105 vIP
10.0.0.254
Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Active: j/Coup rcvd from higher pri router
(105/10.0.0.1)
Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Active router is 10.0.0.1, was local
Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Active -> Speak
Nov 4 02:29:20.238: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak
Nov 4 02:29:20.238: HSRP: Vl1 Redirect adv out, Passive, active 0 passive 1
Nov 4 02:29:20.238: HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Active ->
Speak
Nov 4 02:29:20.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak
pri 100 vIP
10.0.0.254
Nov 4 02:29:20.242: HSRP: Vl1 REDIRECT adv in, Active, active 1, passive 1,
from 10.0.0.1
Nov 4 02:29:20.242: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP
10.0.0.254
Nov 4 02:29:23.234: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP
10.0.0.254
Nov 4 02:29:23.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak
pri 100 vIP
10.0.0.254
Nov 4 02:29:26.238: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP
10.0.0.254
Nov 4 02:29:26.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak
pri 100 vIP
10.0.0.254
Nov 4 02:29:29.234: HSRP: Vl1 Grp 1 Hello in 10.0.0.1 Active pri 105 vIP
10.0.0.254
Nov 4 02:29:29.242: HSRP: Vl1 Grp 1 Hello out 10.0.0.2 Speak
pri 100 vIP
10.0.0.254
Nov 4 02:29:30.238: HSRP: Vl1 Grp 1 Speak: d/Standby timer expired (unknown)
Nov 4 02:29:30.238: HSRP: Vl1 Grp 1 Standby router is local
Nov 4 02:29:30.238: HSRP: Vl1 Grp 1 Speak -> Standby
177
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Nov 4 02:29:30.238:
Nov 4 02:29:30.238:
Standby
Nov 4 02:29:30.238:
10.0.0.254
Nov 4 02:29:32.234:
10.0.0.254
%HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby
HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Speak ->
HSRP: Vl1 Grp 1 Hello
out 10.0.0.2 Standby pri 100 vIP
HSRP: Vl1 Grp 1 Hello
in
10.0.0.1 Active
pri 105 vIP
...
[Truncated Output]
If preemption is not configured, then the output of the debug standby would display the following:
Cat-3550-2#debug standby
HSRP debugging is on
Cat-3550-2#
Nov 4 22:40:32.905: HSRP:
10.0.0.254
Nov 4 22:40:35.905: HSRP:
10.0.0.254
Nov 4 22:40:35.905: HSRP:
10.0.0.254
Nov 4 22:40:36.901: HSRP:
10.0.0.254
Nov 4 22:40:36.901: HSRP:
Nov 4 22:40:38.905: HSRP:
10.0.0.254
Nov 4 22:40:39.901: HSRP:
10.0.0.254
Vl1 Grp 1 Hello
in
10.0.0.1 Speak
Vl1 Grp 1 Hello
out 10.0.0.2 Active
pri 100 vIP
Vl1 Grp 1 Hello
in
10.0.0.1 Speak
pri 95 vIP
Vl1 Grp 1 Hello
in
10.0.0.1 Standby pri 105 vIP
Vl1 Grp 1 Standby router is 10.0.0.1
Vl1 Grp 1 Hello out 10.0.0.2 Active
Vl1 Grp 1 Hello
in
pri 95 vIP
pri 100 vIP
10.0.0.1 Standby pri 105 vIP
...
[Truncated Output]
In the debug output above, the local HSRP gateway first receives a Hello from the remote HSRP
gateway that indicates a priority value of 95. The local HSRP advertises its priority value of 100.
Next, the priority on the remote gateway is changed to 105 as illustrated in the received Hello.
However, because preemption is not enabled, there is no state change because the remote HSRP
gateway does not send out a coup message. Given this, the local gateway remains the active gateway, even though it has a lower HSRP priority value than that of the remote or peer device.
There are two primary reasons why an HSRP gateway does not recognize its peer. The first is due to
a lack of connectivity between the two devices and the second is due to device misconfigurations.
When an HSRP gateway does not recognize its peer, the output of the show standby command
displays a message similar to the following:
CORE1#show standby vlan 1
Vlan1 - Group 1
178
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
Local state is Active, priority 105, may preempt
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 0.057
Virtual IP address is 10.0.0.254 configured
Active router is local
Standby router is unknown
Virtual mac address is 0000.0c07.ac01
2 state changes, last state change 00:00:45
IP redundancy name is “hsrp-Vl1-1” (default)
Vlan1 - Group 2
Local state is Active, priority 100, may preempt
Hellotime 3 sec, holdtime 10 sec
Next hello sent in 2.199
Virtual IP address is 10.1.1.254 configured
Active router is local
Standby router is unknown
Virtual mac address is 0000.0c07.ac02
5 state changes, last state change 00:00:18
IP redundancy name is “hsrp-Vl1-2” (default)
The router output in this section indicates that the gateway is configured for HSRP but does not
recognize its HSRP peers. In order for this to occur, the router must fail to receive HSRP Hellos
from the neighbor router. This can be viewed using the debug standby command as follows:
Cat-3550-1#debug standby
HSRP debugging is on
Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Listen: c/Active timer expired (unknown)
Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Listen -> Speak
Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Backup ->
Speak
Nov 4 02:26:52.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak
pri 105 vIP
10.0.0.254
Nov 4 02:26:55.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak
pri 105 vIP
10.0.0.254
Nov 4 02:26:58.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak
pri 105 vIP
10.0.0.254
Nov 4 02:27:01.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Speak
pri 105 vIP
10.0.0.254
Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Speak: d/Standby timer expired (unknown)
Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Standby router is local
Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Speak -> Standby
Nov 4 02:27:02.094: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby
Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Speak ->
Standby
Nov 4 02:27:02.094: HSRP: Vl1 Grp 1 Hello out 10.0.0.1 Standby pri 105 vIP
10.0.0.254
Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Standby: c/Active timer expired (unknown)
Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Active router is local
Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Standby router is unknown, was local
Nov 4 02:27:02.594: HSRP: Vl1 Grp 1 Standby -> Active
179
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Nov 4 02:27:02.594:
Nov 4 02:27:02.594:
Nov 4 02:27:02.594:
Active
Nov 4 02:27:02.594:
10.0.0.254
Nov 4 02:27:05.594:
10.0.0.254
Nov 4 02:27:05.594:
Active
%HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active
HSRP: Vl1 Redirect adv out, Active, active 1 passive 0
HSRP: Vl1 Grp 1 Redundancy “hsrp-Vl1-1” state Standby ->
HSRP: Vl1 Grp 1 Hello
out 10.0.0.1 Active
pri 105 vIP
HSRP: Vl1 Grp 1 Hello
out 10.0.0.1 Active
pri 105 vIP
HSRP: Vl1 Grp 1 Redundancy group hsrp-Vl1-1 state Active ->
...
[Truncated Output]
Following the output of the debugs, the local gateway sends out a Hello sourced from its local IP
address of 10.0.0.1, advertising a priority value of 105 and reflecting the VIP address of 10.0.0.254.
The local gateway does not hear anything from its peer and transitions to the standby state. Again,
the local gateway hears nothing from its peer and transitions to the active state. Also notice that
there are no received HSRP packets from any other devices.
When troubleshooting this issue, first verify Physical layer connectivity between the gateways. Use
the show interfaces command on the local router to verify Physical layer status. Also keep in
mind that Data Link layer issues (e.g., Spanning Tree and VLAN issues) can cause connectivity issues between the gateways. If there is connectivity between the two gateways, then check for HSRP
misconfigurations on the devices.
In networks using port security, it is not uncommon for HSRP to cause MAC violations due to misconfigurations on the switches. By default, when port security is enabled on a switch port, only a
single secure MAC address is permitted. When port security is configured on the switch ports that
are connected to the HSRP-enabled routers, it causes a MAC violation, since you cannot have the
same secure MAC address on more than one interface. A security violation will occur on a secure
port in one of the following situations:
•
If the maximum number of secure MAC addresses is added to the address table, and a station
whose MAC address is not in the table attempts to access the interface
•
If an address that is learned or configured on one secure interface is seen on another secure
interface in the same VLAN
By default, a port security violation causes the switch interface to become error-disabled and to
shut down immediately, which blocks the HSRP status messages between the routers. Two solutions can be applied to resolve this issue. The first is to configure HSRP using the standby use-bia
180
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
interface configuration command, which forces the gateways to use the interface MAC instead of
the virtual MAC for HSRP. The second alternative is simply to disable port security on the ports
connected to devices running HSRP.
UNDERSTANDING AND TROUBLESHOOTING VRRP
As is the case with HSRP, it is important to have a fundamental understanding of Virtual Router
Redundancy Protocol (VRRP) to be able to troubleshoot and identify protocol failures. Because
VRRP is very similar to HSRP, troubleshooting VRRP follows the same basic concepts described in
the HSRP troubleshooting portion of the previous section.
Understanding Virtual Router Redundancy Protocol
Virtual Router Redundancy Protocol operates in a similar manner to HSRP; however, unlike
HSRP, VRRP is an open standard that is defined in RFC 2338. VRRP sends advertisements to the
Multicast destination address 224.0.0.18 (VRRP), using IP protocol number 112. At the Data Link
layer, advertisements are sent from the virtual router master (VRRP) virtual router MAC address
01-00-5e-00-01xx, where xx represents the two-digit VRRP group number in Hexadecimal value.
This allows you to configure up to 255 virtual routers on an interface. However, the actual number
of virtual routers that a gateway interface can actually support depends on the following factors:
•
Gateway processing capability
•
Gateway memory capability
•
Gateway interface support of multiple MAC addresses
VRRP and HSRP are similar in many ways. For example, both protocols elect the primary router,
which is the active router in HSRP and virtual router master in VRRP, based on the priority values
configured (both use a default of 100). In the event that priority values are the same, the gateway
with the highest IP address is elected.
Another similarity is that both HSRP and VRRP can be configured between more than two LAN
gateways. This allows multiple gateways to provide redundancy for LAN hosts. However, it should
be noted that although both do support this capability, the show vrrp command indicates only
which router is the virtual router master; however, the show standby command will show the active and the standby gateways (if issued on a gateway that is neither) as illustrated below:
Cat-6500-1#show standby vlan 1
Vlan1 - Group 1
Local state is Listen, priority 100
Hello time 3 sec, hold time 10 sec
181
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Next hello sent in 0.585
Virtual IP address is 10.0.0.254 configured
Active router is 10.0.0.3, priority 100 (expires in 7.488 sec)
Standby router is 10.0.0.2, priority 100 (expires in 8.992 sec)
Virtual MAC address is 0000.0c07.ac01
IP redundancy name is “hsrp-Vl1-1” (default)
NOTE: A non-active or non-standby HSRP gateway will remain in the Listen state as illustrated in the output above. All non-virtual router master VRRP gateways will be in a backup
state, regardless of the number of gateways in the group.
Another similarity between HSRP and VRRP is that both protocols support MD5 and plain text
authentication for securing protocol message exchanges. Despite their similarities, there are some
significant differences between HSRP and VRRP. Understanding these differences is not only important from a protocol configuration and implementation perspective but also important from
a troubleshooting perspective. The following section describes some of the differences between
HSRP and VRRP with which you should be intimately familiar. These include version, priority
values, VIP configuration, and Hello packet sending, preemption, and timers.
By default, VRRP version 2 is enabled when VRRP is configured on a gateway in IOS software. Version 2 is the default and current VRRP version. It is not possible to change the version as is the case
with HSRP. There is no VRRP version 1 standard. On the other hand, when HSRP is enabled on a
gateway, by default, HSRP version 1 is enabled. However, Cisco IOS software allows administrators to change this to HSRP version 2 using the standby version <1-2> interface configuration
command.
Like HSRP, VRRP uses priority values to determine which gateway will be elected the virtual router
master. The default VRRP priority value is 100; however, this value can be adjusted manually to a
value between 1 and 254. While the default HSRP priority value is also 100, the configurable priority values for HSRP are between 1 and 255.
NOTE: The priority value of 255 is reserved for a special purpose in VRRP. This is described
in the following section on VRRP virtual IP configuration.
When configuring HSRP, the virtual IP (VIP) address assigned to the HSRP group cannot be the
real IP address assigned to an interface. For example, if an interface has the IP address 10.0.0.1/24,
then the VIP address cannot be that IP address or the IP address of any other device that will be
part of the same HSRP group. If you attempt to configure the real IP address as the VIP address, the
software will print the following error message on the console:
182
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
Cat-6500-1(config)#interface vlan 1
Cat-6500-1(config-if)#standby 1 ip 10.0.0.1
% address cannot equal interface IP address
With VRRP, however, the VIP can be configured simply as a logical IP address that is shared between the routers in the group or as the real IP address of one of the devices in the group. When
the VIP is configured as the real IP address of a gateway, that device is referred to as the IP Address
Owner or sometimes as the Real IP Address Owner.
When the IP Address Owner is up, this gateway will respond to all packets that are sent to the IP
address. This gateway is also the virtual router master for the group and is assigned a priority value
of 255 in Cisco IOS software. As previously stated, only values 1 to 254 are configurable in Cisco
IOS software; therefore, this value is allocated by the software only when the virtual IP address for
the group is equal to that of the real router interface. Consider the topology that is illustrated in
Figure 4-2 below, for example:
R1
R3
Fa0/0: 10.0.0.1/24
VRRP Group 1: VIP 10.0.0.1
Fa0/0: 10.0.0.3/24
Fa0/0: 10.0.0.2/24
R2
Fig. 4-2. Specifying Real Interface Addresses as the VRRP VIP
NOTE: While Figure 4-2 depicts routers, the same concept is applicable on Multilayer switches.
Prior to the configuration of VRRP on the gateways, the interface configurations for all three are
provided in the following section. The interface configuration for R1 is as follows:
R1#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 222 bytes
!
interface FastEthernet0/0
ip address 10.0.0.1 255.255.255.0
duplex auto
speed auto
end
183
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The interface configuration for R2 is as follows:
R2#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 222 bytes
!
interface FastEthernet0/0
ip address 10.0.0.2 255.255.255.0
duplex auto
speed auto
end
Finally, the interface configuration for R3 is as follows:
R3#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 222 bytes
!
interface FastEthernet0/0
ip address 10.0.0.3 255.255.255.0
duplex auto
speed auto
end
Following this initial verification, VRRP Group 1 is configured under the FastEthernet interface of
all gateways using the vrrp 1 ip 10.0.0.1 interface configuration command, which specifies
the real IP address of R1 as the VIP. No additional VRRP configuration is performed. The running
configuration on all gateways is updated as follows, beginning with R1:
R1#show running-config interface fastEthernet0/0
Building configuration...
Current configuration : 222 bytes
!
interface FastEthernet0/0
ip address 10.0.0.1 255.255.255.0
duplex auto
speed auto
vrrp 1 ip 10.0.0.1
end
The interface configuration for R2 is as follows:
R2#show running-config interface FastEthernet0/0
Building configuration...
184
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
Current configuration : 222 bytes
!
interface FastEthernet0/0
ip address 10.0.0.2 255.255.255.0
duplex auto
speed auto
vrrp 1 ip 10.0.0.1
end
Finally, the interface configuration for R3 is as follows:
R3#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 222 bytes
!
interface FastEthernet0/0
ip address 10.0.0.3 255.255.255.0
duplex auto
speed auto
vrrp 1 ip 10.0.0.1
end
Following this configuration, because the real IP address of R1 is used as the VRRP group VIP, this
gateway is designated IP Address Owner and an automatic priority value is assigned to this gateway, allowing it to respond to all packets sent to that IP address. This is confirmed using the show
vrrp [group] command. Following is the output of this command on R1:
R1#show vrrp
FastEthernet0/0 - Group 1
State is Master
Virtual IP address is 10.0.0.1
Virtual MAC address is 0000.5e00.0101
Advertisement interval is 1.000 sec
Preemption enabled
Priority is 255
Master Router is 10.0.0.1 (local), priority is 255
Master Advertisement interval is 1.000 sec
Master Down interval is 3.003 sec
Gateways R2 and R3 default to the backup state and show R1 as the virtual router master. Because
both R2 and R3 show the same information, only the output of the show vrrp command on R2 is
provided below because R3 will reflect the same information:
R2#show vrrp
FastEthernet0/0 - Group 1
State is Backup
185
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Virtual IP address is 10.0.0.1
Virtual MAC address is 0000.5e00.0101
Advertisement interval is 1.000 sec
Preemption enabled
Priority is 100
Master Router is 10.0.0.1, priority is 255
Master Advertisement interval is 1.000 sec
Master Down interval is 3.609 sec (expires in 3.581 sec)
NOTE: When using the interface address of a gateway as the VIP, you may see the following
error message if the gateway goes down and then comes back up:
*Mar 11 17:40:54.159: %IP-4-DUPADDR: Duplicate address 10.0.0.1 on
FastEthernet0/0, sourced by 000d.289e.f940
This error message should be clear immediately; however, if you see it repeatedly citing multiple
MAC addresses, there may be another device on the network using the same address.
Another notable difference between VRRP and HSRP is the sending of Hello packets. When you
enable HSRP, the standby and active gateways exchange Hello packets, which are sent out every
three seconds by default. If you enabled the debug standby command on either gateway, then you
would see an output similar to the following:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
HSRP:
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
1
1
1
1
1
1
1
1
1
1
1
1
1
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
in
out
in
out
in
out
in
out
in
out
in
out
in
10.0.0.2
10.0.0.3
10.0.0.2
10.0.0.3
10.0.0.2
10.0.0.3
10.0.0.2
10.0.0.3
10.0.0.2
10.0.0.3
10.0.0.2
10.0.0.3
10.0.0.2
Standby
Active
Standby
Active
Standby
Active
Standby
Active
Standby
Active
Standby
Active
Standby
pri
pri
pri
pri
pri
pri
pri
pri
pri
pri
pri
pri
pri
100
100
100
100
100
100
100
100
100
100
100
100
100
vIP
vIP
vIP
vIP
vIP
vIP
vIP
vIP
vIP
vIP
vIP
vIP
vIP
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
10.0.0.254
With VRRP, the virtual router master sends advertisements to other VRRP routers in the same
group every one second by default. However, the virtual router backup(s) will not send any packets
to the VRRP group. This is often a point of confusion when troubleshooting. If the debug vrrp
command was issued on the virtual router master, then you would see advertisements being sent
out similar to the following:
*Mar 10 21:23:03.255: VRRP: Grp 1 sending Advertisement checksum D5FA
*Mar 10 21:23:04.183: VRRP: Grp 1 sending Advertisement checksum D5FA
*Mar 10 21:23:05.067: VRRP: Grp 1 sending Advertisement checksum D5FA
186
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
*Mar
*Mar
*Mar
*Mar
*Mar
*Mar
10
10
10
10
10
10
21:23:05.939:
21:23:06.775:
21:23:07.755:
21:23:08.723:
21:23:09.703:
21:23:10.639:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
Grp
Grp
Grp
Grp
Grp
Grp
1
1
1
1
1
1
sending
sending
sending
sending
sending
sending
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
checksum
checksum
checksum
checksum
checksum
checksum
D5FA
D5FA
D5FA
D5FA
D5FA
D5FA
On the virtual router backup, the debug vrrp command would display the following output:
*Mar
*Mar
*Mar
*Mar
*Mar
*Mar
*Mar
*Mar
*Mar
*Mar
11
11
11
11
11
11
11
11
11
11
14:14:29.643:
14:14:29.643:
14:14:30.611:
14:14:30.611:
14:14:31.591:
14:14:31.591:
14:14:32.527:
14:14:32.527:
14:14:33.347:
14:14:33.347:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
VRRP:
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
Grp
1
1
1
1
1
1
1
1
1
1
Advertisement priority 255, ipaddr 10.0.0.1
Event - Advert higher or equal priority
Advertisement priority 255, ipaddr 10.0.0.1
Event - Advert higher or equal priority
Advertisement priority 255, ipaddr 10.0.0.1
Event - Advert higher or equal priority
Advertisement priority 255, ipaddr 10.0.0.1
Event - Advert higher or equal priority
Advertisement priority 255, ipaddr 10.0.0.1
Event - Advert higher or equal priority
In the debug output on the virtual router backup, the event being observed is that the gateway
received an advertisement from another gateway for VRRP Group 1 that has a higher or equal
priority to itself. This message does not indicate a message sent by the local gateway itself. The
advertisement messages include the priority and address of the current virtual router master.
By default, unlike HSRP, preemption is enabled for VRRP and no explicit configuration is required
by the administrator to enable this functionality.
Finally, there is a difference in the range of MAC addresses used by both VRRP and HSRP. When
using VRRP, at the Data Link layer, advertisements are sent from the virtual router master virtual router MAC address 01-00-5e-00-01xx, where xx represents the two-digit Hexadecimal group
number. With HSRPv1, the Layer 2 address that is used by the virtual IP address will be the virtual
MAC address 0000.0C07.ACxx, where xx is the HSRP group number in Hexadecimal value and is
based on the respective interface. HSRPv2 uses MAC addresses in the range of 0000.0C9F.F000 to
0000.0C9F.FFFF for the virtual gateway IP address.
Troubleshoo ng Virtual Router Redundancy Protocol
For the most part, VRRP troubleshooting follows the same basic logic as HSRP troubleshooting.
For example, assume that one or more gateways were logging the following messages:
*Mar 11 17:39:38.699: %VRRP-6-STATECHANGE: Fa0/0 Grp 1 state Backup -> Master
*Mar 11 17:39:42.059: %VRRP-6-STATECHANGE: Fa0/0 Grp 1 state Master -> Backup
*Mar 11 17:40:12.531: %VRRP-6-STATECHANGE: Fa0/0 Grp 1 state Backup -> Master
187
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
*Mar
*Mar
*Mar
*Mar
*Mar
11
11
11
11
11
17:40:19.167:
17:40:27.835:
17:40:36.851:
17:40:44.775:
17:40:54.171:
%VRRP-6-STATECHANGE:
%VRRP-6-STATECHANGE:
%VRRP-6-STATECHANGE:
%VRRP-6-STATECHANGE:
%VRRP-6-STATECHANGE:
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Fa0/0
Grp
Grp
Grp
Grp
Grp
1
1
1
1
1
state
state
state
state
state
Master
Backup
Master
Backup
Master
->
->
->
->
->
Backup
Master
Backup
Master
Backup
Given the state transitions being logged, you would perform the same troubleshooting for VRRP
as you would if this were HSRP. In other words, you would check Physical layer connectivity, verify
whether there are any Data Link layer problems, check for excessive network traffic or congestion,
and verify gateway resource utilization using the show processes command.
UNDERSTANDING AND TROUBLESHOOTING GLBP
As is the case with the two previously described First Hop Redundancy Protocols (FHRPs), it is
important to have a fundamental understanding of Gateway Load Balancing Protocol (GLBP) in
order to troubleshoot any protocol failures or abnormal behavior effectively. Following the same
logic used in the previous two sections, this section will first delve into GLBP operation and then
conclude with a brief section on how to troubleshoot GLBP.
Understanding Gateway Load Balancing Protocol
Gateway Load Balancing Protocol is a Cisco-proprietary FHRP, like HSRP. However, unlike HSRP
and VRRP, which allow for multiple active gateways via the use or configuration of multiple groups,
GLBP allows multiple gateways to forward packets actively using a single GLBP group. GLBP gateways communicate through Hello messages that are sent every three seconds to the Multicast
address 224.0.0.102, using UDP port 3222.
When Global Load Balancing Protocol is enabled, the GLBP group members elect one gateway to
be the active virtual gateway (AVG) for that group. The AVG is the gateway that has the highest
priority value. In the event that the priority values are equal, the AVG will be elected as the gateway
with the highest IP address in the group. The other gateways in the GLBP group provide backup for
the AVG in the event that the AVG becomes unavailable.
The AVG answers all ARP requests for the virtual router address. In addition, the AVG assigns a
virtual MAC address to each member of the GLBP group. Each gateway is therefore responsible for
forwarding packets that are sent to the virtual MAC address it has been assigned by the AVG. These
gateways are referred to as active virtual forwarders (AVFs) for their assigned MAC addresses.
GLBP operation is illustrated in Figure 4-3 below:
188
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
Fig. 4-3. Gateway Load Balancing Protocol Operation
Referencing Figure 4-3, gateway GLBP-1 is configured with a priority of 110, gateway GLBP-2 is
configured with a priority of 105, and gateway GLBP-3 is using the default priority of 100. GLBP1 is elected AVG, and GLBP-2 and GLBP-3 are assigned virtual MAC addresses bbbb.bbbb.bbbb
and cccc.cccc.cccc, respectively, and become AVFs for those virtual MAC addresses. GLBP-1 is
also the AVF for its own virtual MAC address, which is aaaa.aaaa.aaaa.
Hosts 1, 2, and 3 are all configured with the default gateway address 192.168.1.254, which is the
virtual IP address assigned to the GLBP group. Host 1 sends out an ARP Broadcast for its gateway
IP address. This is received by the AVG (GLBP-1), which responds with its own virtual MAC address aaaa.aaaa.aaaa. Host 1 forwards traffic to 192.168.1.254 to this MAC address.
Host 2 sends out an ARP Broadcast for its gateway IP address. This is received by the AVG (GLBP1), which responds with the virtual MAC address of bbbb.bbbb.bbbb (GLBP-2). Host 2 forwards
traffic to 192.168.1.254 to this MAC address and GLBP-2 forwards this traffic.
Host 3 sends out an ARP Broadcast for its gateway IP address. This is received by the AVG (GLBP1), which responds with the virtual MAC address of cccc.cccc.cccc (GLBP-3). Host 3 forwards
traffic to 192.168.1.254 to this MAC address and GLBP-3 forwards this traffic.
189
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
A GLBP group allows up to four virtual MAC addresses per group. The AVG is responsible for assigning the virtual MAC addresses to each member of the group. Other group members request a
virtual MAC address after they discover the AVG through Hello messages. Gateways are assigned
the next virtual MAC address in sequence. A gateway that is assigned a virtual MAC address by
the AVG is known as a primary virtual forwarder while a gateway that has learned the virtual
MAC address is referred to as a secondary virtual forwarder. The allocated virtual MAC addresses
for the AVFs can be seen using the show glbp command illustrated in the following output:
R3#show glbp FastEthernet0/0
FastEthernet0/0 - Group 1
State is Active
5 state changes, last state change 00:01:21
Virtual IP address is 10.0.0.254
Hello time 3 sec, hold time 10 sec
Next hello sent in 0.000 secs
Redirect time 600 sec, forwarder time-out 14400 sec
Preemption disabled
Active is local
Standby is 10.0.0.2, priority 105 (expires in 8.996 sec)
Priority 110 (configured)
Weighting 100 (default 100), thresholds: lower 1, upper 100
Load balancing: round-robin
Group members:
000d.289e.f940 (10.0.0.1)
000f.235e.f120 (10.0.0.3) local
0013.7faf.3e00 (10.0.0.2)
There are 3 forwarders (1 active)
Forwarder 1
State is Listen
4 state changes, last state change 00:01:00
MAC address is 0007.b400.0101 (learnt)
Owner ID is 000d.289e.f940
Redirection enabled, 599.188 sec remaining (maximum 600 sec)
Time to live: 14399.188 sec (maximum 14400 sec)
Preemption enabled, min delay 30 sec
Active is 10.0.0.1 (primary), weighting 100 (expires in 9.188 sec)
Forwarder 2
State is Listen
2 state changes, last state change 00:01:01
MAC address is 0007.b400.0102 (learnt)
Owner ID is 0013.7faf.3e00
Redirection enabled, 597.144 sec remaining (maximum 600 sec)
Time to live: 14397.144 sec (maximum 14400 sec)
Preemption enabled, min delay 30 sec
Active is 10.0.0.2 (primary), weighting 100 (expires in 7.144 sec)
Forwarder 3
State is Active
3 state changes, last state change 00:01:32
MAC address is 0007.b400.0103 (default)
190
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
Owner ID is 000f.235e.f120
Redirection enabled
Preemption enabled, min delay 30 sec
Active is local, weighting 100
Within the GLBP group, a single gateway is elected as the AVG, and another gateway is elected as
the standby virtual gateway. All other remaining gateways in the group are placed in a Listen state.
If an AVG fails, then the standby virtual gateway will assume responsibility for the virtual IP address. At the same time, an election is held and a new standby virtual gateway is then elected from
the gateways currently in the Listen state.
In the event the AVF fails, one of the secondary virtual forwarders in the Listen state assumes
responsibility for the virtual MAC address. However, because the new AVF is already a forwarder
using another virtual MAC address, GLBP needs to ensure that the old forwarder MAC address
ceases being used and hosts are migrated away from this address. This is achieved using the following two timers:
1. The redirect timer
2. The timeout timer
The redirect time is the interval during which the AVG continues to redirect hosts to the old virtual
forwarder MAC address. When this timer expires, the AVG stops using the old virtual forwarder
MAC address in ARP replies, although the virtual forwarder will continue to forward packets that
were sent to the old virtual forwarder MAC address.
When the timeout timer expires, the virtual forwarder is removed from all gateways in the GLBP
group. Any clients still using the old MAC address in their ARP caches must refresh the entry to
obtain the new virtual MAC address. GLBP uses Hello messages to communicate the current state
of these two timers.
By default, the GLBP gateway preemptive scheme is disabled. A backup virtual gateway can become
the AVG only if the current AVG fails, regardless of the priorities assigned to the virtual gateways.
This default behavior is the same as that which is employed by HSRP. GLBP uses a weighting scheme
to determine the forwarding capacity of each gateway that is in the GLBP group. The weighting assigned to a gateway in the GLBP group can be used to determine whether it will forward packets and,
if so, the proportion of hosts residing on the LAN for which it will forward packets.
By default, each gateway is assigned a weight of 100. Administrators can additionally configure the
gateways to make dynamic weighting adjustments by configuring object tracking, such as for inter-
191
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
faces and IP prefixes, in conjunction with GLBP. If an interface fails, then the weighting is dynamically decreased by the specified value, allowing gateways with higher weighting values to be used to
forward more traffic than those with lower weighting values.
Troubleshoo ng Gateway Load Balancing Protocol
GLBP troubleshooting follows the same logic as the other FHRPs described in previous sections.
In order to troubleshoot effectively, you need to understand intimately how the protocol is configured and operates. Real-time troubleshooting for GLBP can be performed by issuing the debug
glbp command. However, as is the case with any debug command, keep in mind that this action
increases processor utilization on the gateway, which may degrade performance.
TROUBLESHOOTING SWITCH SUPERVISOR REDUNDANCY
No explicit means or methods to troubleshoot Switch Supervisor Redundancy actually exist. Instead, the troubleshooting process entails understanding the redundancy method implemented,
and how it is supposed to function (i.e., what the norm is). From there, you can identify any anomalies with the switchover process. For the most part, if a supervisor redundancy mode is not functioning the way it should be, you should seek the assistance of the TAC. The following section
describes the supported supervisor redundancy methods.
Understanding Switch Supervisor Redundancy
Cisco Catalyst 4500 and 6500 Series switches support two supervisor modules or engines within
the switch chassis to allow for high availability. When the switch boots up, the first supervisor that
boots up is referred to as the primary or active supervisor engine and the second supervisor module
is referred to as the standby or redundant supervisor engine. When the primary or active supervisor is up, it controls all switch functions, including management operations, data plane operations,
and control plane operations. In the event that the active supervisor engine fails or is removed, the
standby supervisor assumes this responsibility. The standby supervisor engine assumes primary
supervisor engine status when one of the following events occurs:
•
The primary supervisor engine fails or crashes
•
The primary supervisor engine is rebooted
•
The administrator forces a manual failover from active to standby
•
The primary supervisor engine is physically removed
•
Clock synchronization between the supervisor engines fails (SSO)
Cisco IOS software supports the following three modes for redundant supervisor engine implementations:
192
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
1. Route Processor Redundancy
2. Route Processor Redundancy Plus
3. Stateful Switchover
With Route Processor Redundancy (RPR), when the switch boots up, the RPR process runs between the two supervisor engines, and the first supervisor to complete the boot process becomes
the active supervisor engine and the second supervisor becomes the standby supervisor engine.
When using RPR, the standby supervisor engine is only partially booted and initialized, meaning
that not all switch subsystems become operational. For example, on the standby, the Multilayer
Switch Feature Card (MSFC) and the Policy Feature Card (PFC) are not active.
Clock synchronization occurs between primary and backup every 60 seconds, and the startup configuration and configuration registers are synchronized between supervisors. When the active supervisor engine fails, the standby supervisor engine becomes operational and the following occurs
within the switch:
•
•
•
All switching modules are reloaded and powered up again
Remaining subsystems on the MSFC are brought up
ACLs are reprogrammed into supervisor engine hardware
Because the standby supervisor engine is not fully initialized, any failover from the active to the
standby supervisor engine results in a disruption of network traffic, as the standby supervisor engine goes through the steps listed above and assumes the active supervisor role. This entire process
generally takes two to four minutes to accomplish.
Route Processor Redundancy Plus (RPR+) improves on RPR and provides failover generally within
30 to 60 seconds. When RPR+ mode is used, the redundant supervisor engine is fully initialized
and configured but is not fully operational. When the redundant supervisor engine first initializes,
the startup configuration file is copied from the active supervisor engine to the redundant supervisor engine, which overrides any existing startup configuration file on the redundant supervisor
engine, allowing the supervisor engines to become synchronized.
When switch configuration changes occur during normal operation, redundancy performs an incremental synchronization from the active supervisor engine to the redundant supervisor engine.
RPR+ synchronizes user-entered CLI commands incrementally line-by-line from the active supervisor engine to the redundant supervisor engine.
Even though the redundant supervisor engine is fully initialized, it interacts only with the active
supervisor engine to receive incremental changes to the configuration files. The console on the
193
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
redundant supervisor engine is locked and you cannot enter CLI commands on the redundant supervisor engine. When the active supervisor engine fails, the redundant supervisor engine finishes
initializing without reloading other switch modules, and the following events occur on the switch:
•
Traffic is disrupted until the redundant supervisor engine completes the takes over
•
The switch maintains any static routes across the switchover
•
The switch does not maintain any dynamic routing protocol information
•
The switch clears the FIB tables on switchover
•
The switch clears the CAM tables on switchover
•
State information, such as active TCP sessions, is not maintained on switchover
When implementing RPR+, it is important to ensure that the supervisor modules are similar (i.e.,
the same model, memory, and version, for example) and that they are running the same Cisco IOS
software version. If any of these are different, the switch will revert to RPR mode instead of RPR+.
Stateful Switchover (SSO) is the preferred redundancy mode for supervisor engines. Similar to
RPR and RPR+, SSO establishes one of the supervisor engines as active while the other supervisor
engine is designated standby. Unlike RPR and RPR+, however, with SSO, the redundant supervisor
engine is fully booted and initialized and then SSO synchronizes the two supervisors.
With SSO, supervisor engines must be synchronized so that the redundant supervisor engine is
always ready to assume control in the event that the active supervisor engine fails. Configuration
information and data structures are synchronized between the supervisor engines at startup and
whenever changes to the active supervisor engine configuration occur.
Unlike RPR and RPR+ redundancy, SSO maintains state information between the redundant supervisor engines. This includes forwarding information in the FIB, as well as adjacency entries,
which ensures that Layer 2 traffic is not interrupted and the switch can still forward Layer 3 traffic
after a switchover from the active to the redundant supervisor engine. When using SSO, the following events cause a switchover:
•
A hardware failure on the active supervisor engine
•
Clock synchronization failure between supervisor engines
•
A manual switchover
During SSO switchover, all system control and routing protocol execution is transferred from the
active supervisor to the standby supervisor engine within zero to three seconds. Non-Stop Forwarding (NSF) works in conjunction with SSO to minimize the amount of time a network is un-
194
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
available to its users following a switchover, while continuing to forward IP packets. NSF is used
primarily to ensure the continued forwarding of IP packets following a supervisor engine switchover and is supported by BGP, OSPF, EIGRP, IS-IS, and CEF.
Non-Stop Forwarding allows routing protocols to detect a switchover and take the necessary action to continue forwarding network traffic. NSF allows routing protocols to recover route information from the NSF-capable peer devices instead of waiting for the FIB to be rebuilt before the
switch can actually begin forwarding traffic. This allows for high availability and resiliency during
supervisor engine switchover. When NSF is implemented, routing protocols depend on CEF to
continue forwarding packets during switchover while they build the Routing Information Base
(RIB) tables. After the routing protocols have converged, CEF updates the FIB table and removes
stale route entries. CEF also updates the switch modules with the new FIB information. Cisco NSF
is configured on a per-routing protocol basis.
TROUBLESHOOTING SWITCH PERFORMANCE ISSUES
In the final section of this chapter, we will discuss some reasons for switch performance degradation. Additionally, we will also discuss solutions that can be implemented to safeguard or mitigate
against these contributing factors. One of the most telling signs of performance issues is high CPU
utilization. However, a common misconception is that high CPU utilization indicates the depletion of resources on a device and the threat of a crash. While this may be true for software-based
routers, a capacity issue is almost never a symptom of high CPU utilization with hardware-based
forwarding switches, such as the Catalyst 4500 and 6500 Series switches.
Cisco software-based routers use software in order to process and route packets. Therefore, the
CPU utilization on a router tends to increase as it performs more packet processing and routing.
On such platforms, the show processes cpu command can provide a fairly accurate indication
of the traffic processing load on the router. However, this is not always true for hardware-based
forwarding switches. Before we delve into detail on some of the reasons that contribute to or can
cause Catalyst switch performance issues, the following section briefly describes the architecture
of the Catalyst 6500 Supervisor Engine 720.
Catalyst 6500 Series Switch Supervisor Module Components
The Supervisor engine is the ‘brains’ of the Catalyst 6500 Series switches. Although going into
detail on all components of the Supervisor 720 module is beyond the scope of the TSHOOT certification exam, a basic understanding of the Supervisor module is required in order to understand
the terminology used in MLS. The Supervisor 720 module is comprised of the following three
integrated core components:
195
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
1. The Multilayer Switch Feature Card 3
2. The Policy Feature Card 3
3. The switch or switching fabric
The Multilayer Switch Feature Card 3 (MSFC 3) is a standard daughter card on the Supervisor 720
engine. The MSFC 3 runs all software processes and supports both the Switch Processor (SP) and
Route Processor (RP). The RP supports Layer 3 features and functions, such as routing protocols,
address resolution, ICMP, the management of virtual interfaces, and IOS configuration, among
many other things. The SP supports Layer 2 features and functionality, such as the Spanning Tree
Protocol, VLAN Trunking Protocol, and Cisco Discovery Protocol.
The MSFC 3 builds the CEF FIB table in software and then downloads this table to the hardware
ASICs on the Policy Feature Card 3 (PFC 3) and distributed forwarding engine (if present) that
make the forwarding decisions for IP Unicast and Multicast traffic.
The Policy Feature Card 3 (PFC3) is equipped with a high-performance ASIC complex that supports a wide range of hardware-based features. The PFC makes forwarding decisions in hardware
and supports routing and bridging, Quality of Service (QoS), and IP Multicast packet replication,
and processes security policies such as Access Control Lists (ACLs).
The PFC 3 requires the RP to populate the route cache or optimized route table structure used by
the Layer 3 switching ASIC. If no route processor is present, the PFC can perform only Layer 3
and Layer 4 QoS classification and ACL filtering but will not be able to perform Layer 3 switching.
Finally, the switch fabric is the connection between multiple ports or slots within a switch. It is used
for data transport.
As was previously stated, in hardware-based platforms such as the Catalyst 6500 Series switches,
the switch makes forwarding decisions in hardware. Therefore, when the switches make the forwarding or switching decision for most frames that pass through the switch, the process does not
involve the supervisor engine CPU. However, there are reasons why some traffic must be processed in software instead of hardware. This concept is referred to as punting because the packet
is punted from hardware to software for processing. These reasons include, but are not limited to,
the following events or functions:
•
Packets destined to the switch, such as a Telnet session that is destined for the switch
•
Packets requiring special processing, such as packets with IP options or expired TTL
•
ACL-based features, such as ACL logging
•
Hardware resources full conditions, such as when the CAM or TCAM are full
196
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
•
Multicast traffic, such as IGMP packets
•
Other features, such as NBAR and DHCP snooping
•
IP version 6 packet processing
When packets are processed in software instead of hardware, the CPU utilization of the RP is
increased due to the additional load. This increased CPU utilization can result in significant performance and forwarding reduction for the switch.
In addition to these features, high CPU utilization can also be caused by other events, including Broadcast storms, Spanning Tree loops, debugging, SNMP polling, and SPAN sessions. It is
important to understand that high CPU utilization on the switch does not necessary reflect the
hardware forwarding performance of the switch. For example, a small consistent stream of packets
with IP options may result in high CPU utilization, even when no other traffic is present.
In order to troubleshoot high CPU utilization issues, you need some sort of reference point. For
this reason, it is considered good practice to baseline and monitor the supervisor engine’s CPU
utilization and make a note of all of the processes that generate the highest CPU utilization in a
stable network environment with a ‘normal’ traffic load. This provides a solid reference point that
you can then use to determine ‘normal’ from ‘abnormal’ utilization.
In Cisco IOS software, you can use the show processes cpu [history] command to view
processor utilization statistics over a five second, one minute, and five minute period of time. Appending the history keyword provides processor utilization data over a period of one minute,
one hour, and 72 hours. Using this and the baseline information, you can determine whether the
CPU is consistently high or whether there are spikes of high utilization. If the CPU is consistently
high, identify the process(es) causing this and troubleshoot them as needed. In the event of spikes,
you may need to perform additional activities, such as a SPAN of the CPU, to determine what is
causing the high utilization and spikes.
If the CPU utilization is high due to the punt of traffic to the RP, determine what that traffic is
and why the traffic is punted. On distributed forwarding Catalyst switches, you can use the show
interfaces command to determine whether packets ingressing a certain interface are being
switched in hardware or software by viewing the Layer 2 and Layer 3 counters as illustrated below:
Cat-6500-1#show interfaces GigabitEthernet1/1
GigabitEthernet1/1 is up, line protocol is up (connected)
Hardware is C6k 1000Mb 802.3, address is 000a.42d1.7580 (bia 000a.42d1.7580)
Internet address is 100.100.100.2/24
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
197
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Half-duplex, 100Mb/s
input flow-control is off, output flow-control is off
Clock mode is auto
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of “show interface” counters never
Input queue: 5/75/1/24075 (size/max/drops/flushes); Total output drops: 2
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 7609000 bits/sec, 14859 packets/sec
30 second output rate 0 bits/sec, 0 packets/sec
L2 Switched: ucast: 0 pkt, 184954624 bytes - mcast: 1 pkt, 500 bytes
L3 in Switched: ucast: 2889916 pkt, 0 bytes - mcast: 0 pkt, 0 bytes mcast
L3 out Switched: ucast: 0 pkt, 0 bytes mcast: 0 pkt, 0 bytes
2982871 packets input, 190904816 bytes, 0 no buffer
Received 9 broadcasts, 0 runts, 0 giants, 0 throttles
1 input errors, 1 CRC, 0 frame, 28 overrun, 0 ignored
0 input packets with dribble condition detected
1256 packets output, 124317 bytes, 0 underruns
2 output errors, 1 collisions, 2 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier
0 output buffer failures, 0 output buffers swapped out
In the output of the show interfaces command, the L2 traffic is the amount of traffic that is
switched in hardware, while the L3 traffic shows the amount of received traffic that is punted to
the CPU. From the output above, there is a great deal of traffic being punted to the CPU, which
can result in high CPU utilization. If a great deal of traffic is being process-switched and is punted
to the CPU, the show processes cpu command will reflect high utilization statistics for the IP
Input process. You can then use additional Cisco IOS software tools and utilities, such as SPAN,
to determine where this traffic is coming from (i.e., the source), as well as what kind of traffic it is.
Protec ng the Route Processor
As stated earlier in this chapter, when using CEF, the majority of packets are forwarded by the PFC,
referencing the entries contained in the FIB, which is populated by the MSFC. However, there are
certain exception packets, such as packets with IP options, which must be punted to the Route
Processor (MSFC) for further processing. While the PFC can forward up to 30 million packets
per second (pps), the MSFC is typically capable of forwarding only up to 500,000 packets per second (pps). This significant difference in the forwarding capabilities means that it is possible for the
MSFC to be oversubscribed or overutilized if the PFC punts a large number of packets to it. This
may result in the following:
198
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
•
Routing protocols getting out of sync with the rest of the network. This may result in network
flaps and major network-wide transitions.
•
The console on the switch may lock up. This results in the switch becoming unreachable and
unmanageable, leaving administrators no avenues to troubleshoot.
•
Other RP-based processes may cease operation altogether. This may result in the switch running with unpredictable results, or crashing.
To prevent such situations, IOS software allows administrators to configure MLS rate limiting using the mls rate-limit global configuration command. Rate limiters throttle the packets per
second (pps) rate of certain packets that are punted to the MSFC by the PFC, which effectively
ensures that the MSFC is never overwhelmed by the much faster PFC, thereby allowing the switch
to continue normal operations. Because the rate limiting functionality is performed in hardware,
MLS rate limiters are typically referred to as hardware rate limiters (HWRLs) in various texts. MLS
rate-limiter configuration is beyond the scope of the TSHOOT certification exam.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Catalyst Switch VLAN Interfaces Overview
•
In a switched network, VLANs separate devices into different collision domains
•
VLANs are also used to separate devices into different subnets
•
Multilayer switches support the configuration of Switched Virtual Interfaces (SVIs)
•
SVIs represent VLANs and allow the switch to serve as the default gateway for the VLAN
•
An SVI is not automatically created when a VLAN is created
•
By default, however, an SVI for VLAN 1 is automatically created by the software
•
A Switched Virtual Interface is a very resilient interface
•
In order for an SVI to be placed into the up/up state, the following conditions must be met:
1. The VLAN exists and is active in the VLAN database of the switch
2. The VLAN interface is not administratively down
3. At least one Layer 2 (access port or trunk) port exists, has a link up on this VLAN
4. At least one Layer 2 (access port or trunk) is in the STP forwarding state
Catalyst Switch MLS Overview
•
Multilayer Switching (MLS) combines Layer 2, Layer 3, and Layer 4 switching technologies
•
MLS allows switches to forward packets at wire speed using hardware
•
Cisco supports MLS for both Unicast and Multicast traffic flows
•
In MLS switching, an MLS cache, is maintained for the Layer 3-switched flows
199
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
The MLS cache maintains flow information for all active flows
•
The MLS cache includes entries for traffic statistics
•
After the MLS cache is created, packets for an existing flow can be Layer 3-switched
•
MLS integrates both data plane and control plane functions
•
The control plane is where routing and other control information, is stored and exchanged
•
The control plane is responsible for updating the routing table
•
The data plane is responsible for the actual forwarding of data
•
The data plane is typically populated using information derived from the control plane
•
MLS is enabled by configuring Cisco Express Forwarding (CEF) on the switch
•
CEF uses a FIB to make IP destination prefix-based switching decisions
•
The FIB is conceptually similar to a routing table or information base (RIB)
•
The adjacency table is created to contain all connected next hops
•
An adjacent node is a node that is one hop away, i.e. directly connected
Troubleshoo ng Mul layer Switching
•
MLS troubleshooting requires troubleshooting at both the control and data planes
•
The following steps should be taken when troubleshooting Unicast MLS issues:
1. Verify that IP routing information for the address is correct
2. Verify that the next hop has a valid MAC address
3. Verify that the FIB next hop is the same as the RIB next hop
4. Verify the CEF adjacency table rewrite information
5. Verify FIB and adjacency table population in TCAM
Understanding and Troubleshoo ng HSRP
•
Hot Standby Router Protocol is a Cisco-proprietary First Hop Redundancy Protocol
•
Cisco IOS software supports two versions of HSRP: version 1 and version 2
•
By default, when HSRP is enabled in Cisco IOS software, HSRP version 1 is enabled
•
HSRPv1 supports up to 255 groups; HSRPv2 supports up to 4096 groups
•
HSRPv1 uses Multicast group address 224.0.0.2 and UDP port 1985
•
HSRPv2 uses Multicast group address 224.0.0.102 and UDP port 1985
•
The version 2 packet format uses a Type/Length/Value (TLV) format
•
HSRPv1 does not support millisecond timer values; HSRPv2 supports millisecond timers
•
HSRPv2 includes a 6-byte Identifier field
•
HSRPv1 uses virtual MAC addresses in the range 0000.0c07.acxx
•
HSRPv2 uses virtual MAC addresses in the range 0000.0C9F.F000 to 0000.0C9F.FFFF
•
The majority of HSRP issues are due to router and switch misconfigurations
•
Common HSRP problem scenarios include the following:
1. Gateway Logging Continuous HSRP State Changes
200
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
2. HSRP Gateways Not Reflecting the Correct State
3. HSRP Does Not Detect Peer Router
4. HSRP Causes MAC Violation on a Secure Switch Port
Understanding and Troubleshoo ng VRRP
•
VRRP operates in a similar manner to HSRP; however VRRP is an open standard
•
VRRP is defined in RFC 2338
•
VRRP sends advertisements to Multicast address 224.0.0.18, using IP protocol number 112
•
At the Data Link layer, VRRP uses MAC addresses in the range 01-00-5e-00-01xx
•
Both HSRP and VRRP use a default priority value of 100
•
By default, VRRP version 2 is enabled when VRRP is configured on a gateway
•
Configurable VRRP priorities range from 1-254; for HSRP the range is 1-255
•
VRRP priority 255 is automatically configured when an interface IP is used for a group
•
When the IP Address Owner is up, it responds to all packets that are sent to the IP address
Understanding and Troubleshoo ng GLBP
•
Gateway Load Balancing Protocol is a Cisco-proprietary FHRP, like HSRP
•
GLBP allows multiple gateways to actively forward packets using a single GLBP group
•
GLBP gateways communicate through Hello messages that are sent every 3 seconds
•
GLBP sends updates to the Multicast address 224.0.0.102, using UDP port 3222
•
GLBP group members elect one gateway to be the AVG for that group
•
The AVG is the gateway that has the highest priority value
•
The other gateways in the GLBP group provide backup for the AVG
•
The AVG answers all ARP requests for the virtual router address
•
The AVG assigns a virtual MAC address to each member of the GLBP group
•
Each gateway is responsible for forwarding packets that are sent to its virtual MAC address
•
These gateways are referred to as active virtual forwarders (AVFs)
•
A GLBP group allows up to four virtual MAC addresses per group
•
By default, the GLBP gateway preemptive scheme is disabled
•
A backup virtual gateway can become the AVG only if the current AVG fails
•
By default, each gateway is assigned a weight of 100
Troubleshoo ng Switch Supervisor Redundancy
•
Catalyst 4500 and 6500 series switches support two supervisor engines for high availability
•
The standby supervisor engine assumes primary supervisor if the following happens:
1. The primary supervisor engine fails or crashes
2. The primary supervisor engine is rebooted
3. The administrator forces a manual failover from active to standby
201
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
4. The primary supervisor engine is physically removed
5. Clock synchronization between the supervisor engines fails
•
Cisco IOS software supports the following three modes for redundant supervisor implementations:
1. Route Processor Redundancy (RPR)
2. Route Processor Redundancy Plus (RPR+)
3. Stateful Switchover (SSO)
•
When using RPR, the standby supervisor engine is only partially booted and initialized
•
RPR switchover generally takes between 2 and 4 minutes
•
RPR+ improves on RPR and provides failover generally within 30 to 60 seconds
•
With RPR+, the standby is initialized but not fully operational
•
RPR+ synchronizes user-entered CLI commands incrementally line-by-line
•
After RPR+ switchover, the following events occur:
1. Traffic is disrupted until the Redundant Supervisor Engine completes the takes over
2. The switch maintains any static routes across the switchover
3. The switch does not maintain any dynamic routing protocol information
4. The switch clears the FIB Tables on switchover
5. The switch clears the CAM Tables on switchover
6. State information, such as active TCP sessions, is not maintained on switchover
•
SSO is the preferred redundancy mode for supervisor engines
•
With SSO, the redundant supervisor is fully booted and initialized
•
With SSO, supervisor engines must be synchronized
•
With SSO, configuration information and data structures are synchronized
•
SSO maintains state information between the redundant supervisor engines
•
When using SSO, the following events cause a switchover:
1. A hardware failure on the active supervisor engine
2. Clock synchronization failure between supervisor engines
3. A manual switchover
Troubleshoo ng Switch Performance Issues
•
One of the most telling signs of performance issues on devices is high CPU utilization
•
Cisco software-based routers use software in order to process and route packets
•
High CPU utilization typically indicates capacity issues on software-based routers
•
Catalyst 4500 and 6500 series switches are hardware-based platforms
•
High CPU utilization does not indicate capacity issues on hardware-based platforms
•
The Supervisor 720 module is comprised of the following three integrated core components:
202
CHAPTER 4: TROUBLESHOOTING CATALYST SWITCH LAYER 3 PROTOCOLS
1. The Multilayer Switch Feature Card 3
2. The Policy Feature Card 3
3. The Switch or Switching Fabric
•
Even in hardware-based platforms, packets must be punted and processed in software
•
The reasons packets may be punted include the following:
1. Packets destined to the switch, such as a Telnet session that is destined for the switch
2. Packets requiring special processing, such as packets with IP options or expired TTL
3. ACL-based features, such as ACL logging
4. Hardware resources full conditions, such as when the CAM or TCAM are full
5. Multicast traffic, such as IGMP packets
6. Other features, such as NBAR and DHCP Snooping
7. IP version 6 packet processing
203
CHAPTER 5
Troubleshoo ng EIGRP
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
E
nhanced Interior Gateway Routing Protocol is a Cisco-proprietary advanced Distance Vector
routing protocol. As a CCNP network engineer, it is important to understand how to support
EIGRP, as it is a very commonly implemented routing protocol. As with the previous technologies
described thus far in this guide, in order to troubleshoot and support networks running EIGRP, you
must have a solid understanding of the inner workings of the protocol itself. The TSHOOT certification exam objective that is covered in this chapter is as follows:
•
Troubleshoot EIGRP
While it is not possible to delve into all potential EIGRP problem scenarios, this chapter will discuss
some of the most common problem scenarios when EIGRP is implemented as the Interior Gateway
Protocol (IGP) of choice. This chapter is divided into the following sections:
•
Enhanced Interior Gateway Protocol Overview
•
Troubleshooting Neighbor Relationships
•
Troubleshooting Route Installation
•
Troubleshooting Route Advertisement
•
Troubleshooting Stub Routing Issues
•
Troubleshooting SIA Issues
•
Troubleshooting Route Redistribution Issues
•
Debugging EIGRP Routing Issues
ENHANCED INTERIOR GATEWAY PROTOCOL OVERVIEW
EIGRP is an advanced Distance Vector routing protocol that incorporates traditional Distance Vector features, such as split horizon, and traditional Link State features, such as incremental updates.
EIGRP runs directly over IP using protocol number 88 and is a Cisco-proprietary routing protocol.
The following sections describe some core characteristics and components that are integral to the
operation of EIGRP.
It is important to have a solid understanding of EIGRP in order to troubleshoot any routing problems effectively. Keep in mind that the information provided is simply a summary and recap of
the material available in the ROUTE study guide. Please refer to the ROUTE guide for additional
information on specific topics and technologies if needed.
Packets
EIGRP uses several different types of packets to exchange routing and control information between
EIGRP neighbors. EIGRP uses the following types of packets:
206
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
•
Hello packets
•
Acknowledgement packets
•
Update packets
•
Query packets
•
Reply packets
EIGRP sends Hello packets once it has been enabled on a router for a particular network. These
messages are used to identify neighbors and, once identified, serve or function as a keepalive mechanism between neighbors. EIGRP Hello packets are sent to the link local Multicast group address
224.0.0.10. Hello packets sent by EIGRP do not require an Acknowledgment packet to be sent confirming that they were received. Because they require no explicit acknowledgment, Hello packets
are classified as unreliable EIGRP packets. EIGRP Hello packets have an OPCode of 5.
An EIGRP Acknowledgment (ACK) packet is simply an EIGRP Hello packet that contains no data.
Acknowledgement packets are used by EIGRP to confirm reliable delivery of EIGRP packets. ACKs
are always sent to a Unicast address, which is the source address of the sender of the reliable packet,
and not to the EIGRP Multicast group address. In addition, Acknowledgement packets will always
contain a non-zero acknowledgment number. The ACK uses the same OPCode as the Hello packet
(OPCode 5) because it is essentially a Hello packet that contains no information.
EIGRP Update packets are used to convey reachability of destinations. In other words, Update
packets contain EIGRP routing updates. When a new neighbor is discovered, Update packets are
sent via Unicast so the neighbor can build up its EIGRP Topology Table. In other cases, such as a
link cost change, updates are sent via Multicast. It is important to know that Update packets are
always transmitted reliably and always require explicit acknowledgement. Update packets are assigned an OPCode of 1.
EIGRP Query packets are Multicast and are used to request routing information reliably. EIGRP
Query packets are sent to neighbors when a route is not available and the router needs to ask about
the status of the route for fast convergence. If the router that sends out a Query does not receive a
response from any of its neighbors, it resends the Query as a Unicast packet to the non-responsive
neighbor(s). If no response is received after sixteen attempts, then the neighbor relationship is reset. EIGRP Query packets are assigned an OPCode of 3.
EIGRP Reply packets are sent in response to Query packets. The Reply packets are used to respond
reliably to a Query packet. Reply packets are Unicast to the originator of the Query. The EIGRP
Reply packets are assigned an OPCode of 4.
207
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
NOTE: EIGRP can also use another packet type called the Request packet. This is used in route
server applications. EIGRP Request packets can be sent via either Multicast or Unicast, but they
are always transmitted unreliably. In other words, they do not require an explicit acknowledgment. Route server applications are beyond the scope of the TSHOOT certification exam.
Neighbor Discovery and Maintenance
EIGRP supports both dynamic and static (manually configured) neighbor discovery. Dynamic
neighbor discovery is performed by sending Hello packets to the destination Multicast group address 224.0.0.10. Unlike the dynamic EIGRP neighbor discovery process, static EIGRP neighbor
relationships require manual neighbor configuration on the router. When static neighbors are configured, the local router uses the Unicast neighbor address to send packets to these routers to establish a neighbor relationship.
Hello and Hold Timers
EIGRP uses different Hello and Hold timers for different types of media. Hello timers are used to
determine the interval rate EIGRP Hello packets are sent. The Hold timer is used to determine the
time that will elapse before a router considers an EIGRP neighbor as down. By default, the Hold
time is three times the Hello interval.
EIGRP sends Hello packets every five seconds on Broadcast, point-to-point serial, point-to-point
subinterfaces, and multipoint circuits greater than T1. The default Hold time is 15 seconds. EIGRP
sends Hello packets every 60 seconds on other link types. These include low-bandwidth WAN links
less than T1 speed. The default Hold time for neighbor relationships across these links is also three
times the Hello interval and therefore defaults to 180 seconds.
The Neighbor Table
The EIGRP Neighbor Table is used by routers running EIGRP to maintain state information about
EIGRP neighbors. When newly discovered neighbors are learned, the address and interface of the
neighbor is recorded. This is applicable to both dynamically discovered neighbors and statically
defined neighbors.
Reliable Transport Protocol
EIGRP needs its own transport protocol, Reliable Transport Protocol, to ensure the reliable delivery of Update, Query, and Reply packets. The use of sequence numbers also ensures that the EIGRP
packets are received in the correct order.
208
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
Metric Calcula on
EIGRP uses a composite metric, which includes different variables that are referred to as the K values. The K values are constants that are used to distribute weight to different path aspects, which
may be included in the composite EIGRP metric. The default values for the K values are K1 = K3 =
1 and K2 = K4 = K5 = 0. In other words, K1 and K3 are set to a default value of 1, while K2, K4, and
K5 are set to a default value of 0.
Assuming the default K value settings, the complete EIGRP metric can be calculated using the following mathematical formula:
[K1 * bandwidth + (K2 * bandwidth) / (256 - load) + K3 * delay] * [K5 / (reliability + K4)]
However, given that only K1 and K3 have any positive values by default, the default EIGRP metric
calculation is performed using the following mathematical formula:
[(10⁷ ⁄ least bandwidth on path) + (sum of all delays)] × 256
This essentially means that, by default, EIGRP uses the minimum bandwidth on the path to a
destination network and the total delay to compute routing metrics. However, Cisco IOS software
allows administrators to set other K values to non-zero values to incorporate other variables into
the composite metric.
Diffusing Update Algorithm
The Diffusing Update Algorithm (DUAL) is at the crux of the EIGRP routing protocol. DUAL looks
at all routes received from neighbor routers, compares them, and then selects the lowest (best)
metric, loop-free path to the destination network. The best route, which is the route with the lowest
metric or Feasible Distance (FD), is then referred to as the Successor route. The Feasible Distance
includes both the metric of a network as advertised by the connected neighbor and the cost of
reaching that particular neighbor.
The metric that is advertised by the neighbor router is referred to as the Reported Distance (RD) or
as the Advertised Distance (AD). This is that neighbor’s metric to the destination network. Therefore, the Feasible Distance includes the Reported Distance plus the cost of reaching that particular
neighbor. The next-hop router for the Successor route is referred to as the Successor. The Successor
route is placed into the IP routing table and the EIGRP Topology Table. This route points to the
Successor, which is the next-hop router for the Successor route.
209
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Any other routes to the same destination network that have a lower Reported Distance than the
Feasible Distance of the Successor path are guaranteed to be loop-free and are referred to as Feasible Successor routes. These routes are not placed into the IP routing table; however, they are still
placed into the EIGRP Topology Table, along with the Successor routes.
In order for a route to become a Feasible Successor route, it must meet the Feasibility Condition
(FC), which occurs only when the Reported Distance to the destination network is less than the
Feasible Distance. In the event that the Reported Distance is more than the Feasible Distance,
the route is not selected as Feasible Successor. This is used by EIGRP to prevent the possibility of
loops.
The Topology Table
The EIGRP Topology Table is populated by EIGRP PDMs acted upon by the DUAL Finite State
Machine. All known destination networks and subnets that are advertised by neighboring EIGRP
routers are stored in the EIGRP Topology Table. This includes Successor routes, Feasible Successor
routes, and even routes that have not met the Feasibility Condition.
The Topology Table allows all EIGRP routers to have a consistent view of the entire network. It also
allows for rapid convergence in EIGRP networks. Each individual entry in the Topology Table contains the destination network and the neighbor (or neighbors) that have advertised the destination
network. Both the Feasible Distance and the Reported Distance are stored in the Topology Table.
The EIGRP Topology Table contains the information needed to build a set of distances and vectors
to each reachable network, including the following:
•
The lowest bandwidth on the path to the destination network
•
The total or cumulative delay to the destination network
•
The reliability of the path to the destination network
•
The loading of the path to the destination network
•
The minimum Maximum Transmission Unit (MTU) to the destination network
•
The Feasible Distance to the destination network
•
The Reported Distance by the neighbor router to the destination network
•
The route source (only external routes) of the destination network
Stub Rou ng
Stub routing is an EIGRP feature designed primarily to conserve local router resources, such as
memory and CPU, and to improve network stability. The stub routing feature is most commonly
used in hub-and-spoke networks. This feature is configured only on the spoke routers. When configured on the spoke router, the router announces its stub router status using a new Type/Length/
210
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
Value (TLV) in the EIGRP Hello messages. When the hub router receives the Hello packet from the
spoke router, one of two things happens:
1. If the hub router is running a newer version of software, upon receiving the Hello packet with
the new TLV, the router will not query the stub router about the status of any prefixes. This is
the default mode of operation in current Cisco IOS software versions.
2. If the hub router is running a version of software less than 12.0(7)T, upon receiving the Hello
packet with the new TLV, the router will ignore this field because it does not understand it.
The router will send Query packets to the stub router if it needs information about a route or
routes. However, the stub router will respond with a message of inaccessible to any queries
received from the hub router. This method allows for backward compatibility with older versions of software while retaining stub routing functionality.
When the stub routing feature is enabled on the spoke router, the stub router will advertise only
specified routes to the hub router. The router will not advertise routes received from other EIGRP
neighbors to the hub router. Cisco IOS software allows administrators to select the type of routes
that the stub router should advertise to the hub router; however, by default, the stub router will advertise connected and summary routes only. The EIGRP stub routing feature provides the following
four advantages when implemented in hub-and-spoke networks:
1. It prevents sub-optimal routing from occurring within hub-and-spoke EIGRP networks
2. It prevents stub routers with low-speed links from being used as transit routers
3. It eliminates EIGRP Query storms, allowing the EIGRP network to convergence faster
4. It reduces the required amount of configuration commands on the stub routers
EIGRP Route Summariza on
Route summarization reduces the amount of information that routers must process, which then
allows for faster convergence within the network. Summarization also restricts the size of the area
that is affected by network changes by hiding detailed topology information from certain areas
within the network. Finally, summarization is used to define a Query boundary for EIGRP, which
supports the following two types of route summarization:
1. Automatic route summarization
2. Manual route summarization
By default, automatic route summarization is in effect when EIGRP is enabled on the router. This is
implemented using the auto-summary command. This command allows EIGRP to perform automatic route summarization at Classful boundaries.
211
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Unlike EIGRP automatic summarization, EIGRP manual route summarization is configured and
implemented at the interface level using the ip summary-address eigrp [ASN] [network]
[mask] [distance] [leak-map <name>] interface configuration command. By default, an
EIGRP summary address is assigned a default administrative distance value of 5. This default assignment can be changed by specifying the desired administrative distance value as specified by the
[distance]keyword.
By default, when manual route summarization is configured, EIGRP will not advertise the more
specific route entries that fall within the summarized network entry. The leak-map <name> keyword can be configured to allow EIGRP route leaking, wherein EIGRP allows specified specific
route entries to be advertised in conjunction with the summary address. Those entries that are not
specified in the leak map are still suppressed.
EIGRP Unequal Cost Load Sharing
In addition to equal cost load-balancing capabilities, EIGRP is also able to perform unequal cost
load sharing. This unique ability allows EIGRP to use unequal cost paths to send outgoing packets
to the destination network based on weighted traffic share values. Unequal cost load sharing is enabled using the variance <multiplier> router configuration command.
The <multiplier> keyword represents an integer between 1 and 128. A multiplier of 1, which is
the default, implies that no unequal cost load sharing is being performed. This default setting is also
illustrated in the output of the show ip protocols command. If any other value is used, EIGRP
will load-share across the successor route, as well as any other route with a route metric at least x
that of the successor metric. By default, routes that do not meet the Feasibility Condition are excluded from this calculation.
TROUBLESHOOTING NEIGHBOR RELATIONSHIPS
It is important to understand that simply enabling EIGRP between two or more routers does not
guarantee that a neighbor relationship will be established. In addition to certain parameters-matching, additional factors can also result in a failure of EIGRP neighbor relationship establishment. The
EIGRP neighbor relationship may not establish due to any of the following:
•
The neighbor routers are not on a common subnet
•
Mismatched primary and secondary subnets
•
Mismatched K values
•
Mismatched AS number
•
Access Control Lists are filtering EIGRP packets
212
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
•
Physical layer issues
•
Data Link layer issues
•
Mismatched authentication parameters
Uncommon subnet issues are one of the most common problems experienced when attempting
to establish EIGRP neighbor relationships. When EIGRP cannot establish a neighbor relationship
because of an uncommon subnet, the following error message will be printed on the console or will
be logged by the router or switch:
*Mar 2 22:12:46.589 CST: IP-EIGRP(Default-IP-Routing-Table:1): Neighbor
150.1.1.2 not on common subnet for FastEthernet0/0
*Mar 2 22:12:50.977 CST: IP-EIGRP(Default-IP-Routing-Table:1): Neighbor
150.1.1.2 not on common subnet for FastEthernet0/0
The most common reason for the neighbor routers being on an uncommon subnet is a misconfiguration issue. It may be that the router interfaces have been accidentally configured on two different
subnets. However, if the neighbors are connected via a VLAN, it is possible that Multicast packets
could be leaking between VLANs, resulting in this error. The first troubleshooting step, however,
simply would be to verify the interface configuration on the devices. Following this, additional
troubleshooting steps, such as VLAN troubleshooting (if applicable) could be undertaken to isolate
and resolve the issue.
Another common reason for this error message is using secondary addresses when attempting to
establish EIGRP neighbor relationships. Again, the simplest way to troubleshoot such issues is to
verify the router or switch configurations. For example, assume the error message above was being
printed on the console of the local router. The first troubleshooting step would be to validate the IP
addresses configured on the interface as follows:
R1#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 140 bytes
!
interface FastEthernet0/0
ip address 150.2.2.1 255.255.255.0
duplex auto
speed auto
end
Next, validate that the configuration is the same on the device with the IP address 150.1.1.2 as follows:
R2#show running-config interface FastEthernet0/0
Building configuration...
213
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Current configuration : 140 bytes
!
interface FastEthernet0/0
ip address 150.2.2.2 255.255.255.0 secondary
ip address 150.1.1.2 255.255.255.0
duplex auto
speed auto
end
From the output above, we can see that the primary subnet on R1 is the secondary subnet on the
local router. EIGRP will not establish neighbor relationships using a secondary address. The resolution for this issue simply would be to correct the IP addressing configuration under the FastEthernet0/0 interface of R2 as follows:
R2#config terminal
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#interface FastEthernet0/0
R2(config-if)#ip address 150.2.2.2 255.255.255.0
R2(config-if)#ip address 150.1.1.2 255.255.255.0 secondary
R2(config-if)#end
*Oct 20 03:10:27.185 CST: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.2.2.1
(FastEthernet0/0) is up: new adjacency
EIGRP K values are constants that are used to distribute weight to different path aspects, which
may be included in the composite EIGRP metric. The default values for the K values are K1 = K3
= 1 and K2 = K4 = K5 = 0. If changed on one router or switch, then these values must be adjusted
for all other routers or switches within the autonomous system. The default EIGRP K values can be
viewed using the show ip protocols command as illustrated below:
R1#show ip protocols
Routing Protocol is “eigrp 150”
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is 1
Default networks flagged in outgoing updates
Default networks accepted from incoming updates
EIGRP metric weight K1=1, K2=0, K3=1, K4=0, K5=0
EIGRP maximum hopcount 100
EIGRP maximum metric variance 1
Redistributing: eigrp 150, ospf 1
EIGRP NSF-aware route hold timer is 240s
Automatic network summarization is not in effect
Maximum path: 4
Routing for Networks:
10.1.0.0/24
172.16.1.0/30
Routing Information Sources:
Gateway
Distance
Last Update
214
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
(this router)
90
15:59:19
172.16.0.2
90
12:51:56
172.16.1.2
90
00:27:17
Distance: internal 90 external 170
When K values are reset on a router, all neighbor relationships for the local router will be reset. If
the values are not consistent on all routers following the reset, the following error message will be
printed on the console, and the EIGRP neighbor relationship(s) will not be established:
*Oct 20 03:19:14.140 CST: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.2.2.1
(FastEthernet0/0) is down: Interface Goodbye received
*Oct 20 03:19:18.732 CST: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.2.2.1
(FastEthernet0/0) is down: K-value mismatch
NOTE: While EIGRP K values can be adjusted using the metric-weights command, this is
not recommended without assistance from seasoned network engineers or the Technical Assistance Center (TAC).
Unlike OSPF, which uses a locally significant process ID, EIGRP requires the same autonomous
system number (among other variables) when establishing neighbor relationships with other routers. Troubleshoot such issues by comparing configurations of devices and ensuring that the autonomous system number (among other variables) is consistent between routers that should establish neighbor relationships. A good indicator that neighbors are in a different autonomous system
would be a lack of bidirectional Hellos, even in the presence of basic IP connectivity between the
routers. This can be validated using the show ip eigrp traffic command, the output of which
is illustrated in the section that follows.
ACLs and other filters are also common causes for routers failing to establish EIGRP neighbor
relationships. Check router configurations and those of intermediate devices to ensure that EIGRP
or Multicast packets are not filtered. A very useful troubleshooting command to use is the show
ip eigrp traffic command. This command provides statistics on all EIGRP packets. Assume,
for example, that you have verified basic connectivity and configurations between two devices, but
the EIGRP neighbor relationship is still not up. In that case, you could use this command to check
to see whether the routers are exchanging Hello packets, before enabling debugging on the local
device, as illustrated below:
R2#show ip eigrp traffic
IP-EIGRP Traffic Statistics for AS 2
Hellos sent/received: 144/0
Updates sent/received: 0/0
215
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Queries sent/received: 0/0
Replies sent/received: 0/0
Acks sent/received: 0/0
SIA-Queries sent/received: 0/0
SIA-Replies sent/received: 0/0
Hello Process ID: 149
PDM Process ID: 120
IP Socket queue:
0/2000/0/0 (current/max/highest/drops)
Eigrp input queue: 0/2000/0/0 (current/max/highest/drops)
In the output above, notice that the local router has not received any Hello packets, although it has
sent out 144 Hellos. Assuming that you have verified IP connectivity between the two devices, as
well as the configuration, you could also check ACL configurations on the local routers, as well as
intermediate devices (if applicable), to ensure that EIGRP or Multicast traffic is not being filtered.
For example, you might find an ACL that is configured to deny Class D and Class E traffic, while
allowing all other traffic, such as the following ACL:
R2#show ip access-lists
Extended IP access list 100
10 deny ip 224.0.0.0 15.255.255.255 any
20 deny ip any 224.0.0.0 15.255.255.255 (47 matches)
30 permit ip any any (27 matches)
Physical and Data Link layer issues, and ways in which these can affect routing protocols and other
traffic, have been described in detail in previous chapters. You can troubleshoot these issues using
the show interfaces, show interfaces counters, show vlan, and show spanning-tree
commands, among other commands described in those chapters. To avoid being redundant, we
will not restate the Physical and Data Link layer troubleshooting steps.
Finally, common authentication configuration mistakes include using different key IDs when configuring key chains, and specifying different or mismatched passwords. When authentication is enabled under an interface, the EIGRP neighbor relationships are reset and reinitialized. If previously
established neighbor relationships do not come up following authentication implementation, verify
the authentication configuration parameters by looking at the running configuration or using the
show key chain and show ip eigrp interfaces detail [name] commands on the router.
Following is a sample output of the information that is printed by the show key chain command:
R2#show key chain
Key-chain EIGRP-1:
key 1 -- text “eigrp-1”
accept lifetime (always valid) - (always valid) [valid now]
send lifetime (always valid) - (always valid) [valid now]
216
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
Key-chain EIGRP-2:
key 1 -- text “eigrp-2”
accept lifetime (00:00:01 UTC Nov
send lifetime (00:00:01 UTC Nov 1
Key-chain EIGRP-3:
key 1 -- text “eigrp-3”
accept lifetime (00:00:01 UTC Dec
send lifetime (00:00:01 UTC Dec 1
1 2010) - (infinite)
2010) - (infinite)
1 2010) - (00:00:01 UTC Dec 31 2010)
2010) - (00:00:01 UTC Dec 31 2010)
Following is a sample output of the information that is printed by the show ip eigrp interfaces
detail [name] command:
R2#show ip eigrp interfaces detail Serial0/0
IP-EIGRP interfaces for process 1
Xmit Queue
Mean
Pacing Time
Interface
Peers Un/Reliable SRTT
Un/Reliable
Se0/0
0
0/0
0
0/1
Hello interval is 5 sec
Next xmit serial <none>
Un/reliable mcasts: 0/0 Un/reliable ucasts: 0/0
Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 0
Retransmissions sent: 0 Out-of-sequence rcvd: 0
Authentication mode is md5, key-chain is “EIGRP-1”
Use unicast
Multicast
Flow Timer
0
Pending
Routes
0
When troubleshooting in general, it is recommended that you use show commands in Cisco IOS
software instead of enabling debug commands. While debugging provides real-time information,
it is very processor intensive, and it could result in high CPU utilization of the device and, in some
cases, even crash the device. In addition to show commands, you should also pay attention to the
various error messages that are printed by the software, as these provide useful information that
can be used to troubleshoot and isolate the root cause of the problem.
TROUBLESHOOTING ROUTE INSTALLATION
There are instances where you might notice that EIGRP is not installing certain routes into the
routing table. For the most part, this is typically due to some misconfigurations versus a protocol
failure. Some common reasons for route installation failure include the following:
•
The same route is received via another protocol with a lower administrative distance
•
EIGRP summarization
•
Duplicate Router IDs are present within the EIGRP domain
•
The routes do not meet the Feasibility Condition
217
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In the ROUTE guide, we learned that the administrative distance concept is used to determine
how reliable the route source is. The lower the administrative distance, the more reliable the route
source is. If the same route is received from three different protocols, the route with the lowest administrative distance will be installed into the routing table. When using EIGRP, keep in mind that
EIGRP uses different administrative distance values for summary, internal, and external routes. If
you are running multiple routing protocols, it is important to ensure that you understand administrative distance values and how they impact routing table population. This is especially of concern
when you are redistributing routes between multiple routing protocols.
By default, EIGRP automatically summarizes at Classful boundaries and creates a summary route
pointing to the Null0 interface. Because the summary is installed with a default administrative distance value of 5, any other similar dynamically received routes will not be installed into the routing
table. Consider the topology illustrated in Figure 5-1 below, for example:
Fa0/0: 10.2.2.0/24
EIGRP ASN 150
Se0/0: 150.1.1.2/30
Fa0/0: 10.1.1.0/24
Se0/0: 150.1.1.1/30
R2
R1
Fig. 5-1. EIGRP Automatic Summarization
Referencing the diagram illustrated in Figure 5-1, the 150.1.1.0./30 subnet separates 10.1.1.0/24
and 10.2.2.0/24. When automatic summarization is enabled, both R1 and R2 will summarize the
10.1.1.0/24 and 10.2.2.0/24 subnets, respectively, to 10.0.0.0/8. This summary route will be installed
into the routing table with an administrative distance of 5 and a next-hop interface of Null0. This
lower administrative distance value will prevent either router from accepting or installing the
10.0.0.0/8 summary from the other router as illustrated in the following output:
R2#debug eigrp fsm
EIGRP FSM Events/Actions debugging is on
R2#
R2#
*Mar 13 03:24:31.983: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 150.1.1.1
(FastEthernet0/0) is up: new adjacency
*Mar 13 03:24:33.995: DUAL: dest(10.0.0.0/8) not active
*Mar 13 03:24:33.995: DUAL: rcvupdate: 10.0.0.0/8 via 150.1.1.1 metric
156160/128256
*Mar 13 03:24:33.995: DUAL: Find FS for dest 10.0.0.0/8. FD is 128256, RD is
128256
*Mar 13 03:24:33.995: DUAL:
0.0.0.0 metric 128256/0
*Mar 13 03:24:33.995: DUAL:
150.1.1.1 metric 156160/128256 found Dmin is
128256
*Mar 13 03:24:33.999: DUAL: RT installed 10.0.0.0/8 via 0.0.0.0
218
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
In the debug output above, the local router receives the 10.0.0.0/8 route from neighbor 150.1.1.1
with a route metric of 156160/128256. However, DUAL also has the same route locally, due to
summarization, and this route has a route metric of 128256/0. The local route is therefore installed
into the routing table instead because it has the better metric. The same would also be applicable
on R1, which would install its local 10.0.0.0/8 route into the RIB instead. The result is that neither
router would be able to ping the 10.x.x.x subnet of the other router. To resolve this issue, automatic
summarization should be disabled using the no auto-summary command on both of the routers,
allowing the specific route entries to be advertised instead.
The primary use of the EIGRP Router ID (RID) is to prevent routing loops. The RID is used to identify the originating router for external routes. If an external route is received with the same RID
as the local router, the route will be discarded. However, duplicate RIDs do not affect any internal
EIGRP routes. This feature is designed to reduce the possibility of routing loops in networks where
route redistribution is being performed on more than one ASBR. The originating router ID can be
viewed in the output of the show ip eigrp topology command as illustrated below:
R1#show ip eigrp topology 2.2.2.2 255.255.255.255
IP-EIGRP (AS 1): Topology entry for 2.2.2.2/32
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 156160
Routing Descriptor Blocks:
150.1.1.2 (FastEthernet0/0), from 150.1.1.2, Send flag is 0x0
Composite metric is (156160/128256), Route is External
Vector metric:
Minimum bandwidth is 100000 Kbit
Total delay is 5100 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 1
External data:
Originating router is 2.2.2.2
AS number of route is 0
External protocol is Connected, external metric is 0
Administrator tag is 0 (0x00000000)
If you suspect a potential duplicate RID issue, you can check the events in the EIGRP event log to
see if any routes have been rejected because of a duplicate RID. The following illustrates a sample
output of the EIGRP event log, showing routes that have been rejected because they were received
from a router with the same RID as the local router:
R2#show ip eigrp events
Event information for AS 1:
[Truncated Output]
219
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
21
22
23
24
25
26
27
28
03:05:39.747
03:05:39.747
03:05:06.659
03:05:06.659
03:05:06.659
03:04:33.311
03:04:33.311
03:04:33.311
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
route,
route,
route,
route,
route,
route,
route,
route,
neighbor info: 10.0.0.1 Serial0/0
dup router: 150.1.1.254
metric: 192.168.2.0 284160
neighbor info: 10.0.0.1 Serial0/0
dup router: 150.1.1.254
metric: 192.168.1.0 284160
neighbor info: 10.0.0.1 Serial0/0
dup router: 150.1.1.254
...
[Truncated Output]
The resolution for the solution above would be to change the RID on neighbor router 10.0.0.1 or on
the local router, depending on which one of the two has been incorrectly configured.
Finally, it is important to remember that EIGRP will not install routes into the routing table if they
do not meet the Feasibility Condition. This is true even if the variance command has been configured on the local router. It is a common misconception that issuing the variance command
will allow EIGRP to load-share over any paths whose route metric is x times that of the successor
metric. Consider the topology illustrated in Figure 5-2 below, for example:
R1
10
5
20
R2
R3
30
R4
10
15
R5
192.168.100.0/24
Fig. 5-2. Understanding the Feasibility Condition
Figure 5-2 shows a basic network that includes metrics from R1 to the 192.168.100.0/24 subnet.
Referencing Figure 5-2, Table 5-1 below displays the Reported Distance and Feasible Distance values as seen on R1 for the 192.168.100.0/24 network.
220
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
Table 5-1. R1 Paths and Distances
Network Path
R1 – R2 – R5
R1 – R3 – R5
R1 – R4 – R5
R1 Neighbor
R2
R3
R4
Neighbor Metric (RD)
30
10
15
R1 Feasible Distance
35
30
25
R1 has been configured to load-share across all paths and the variance 2 command is added to
the router configuration. This allows EIGRP to load-share across paths with up to twice the metric
of the Successor, which would include all three paths based on the default metric calculation. However, despite this configuration, only two paths will be installed and used.
First, R1 will select the path through R4 as the Successor route based on the Feasible Distance for
the route, which is 25. This route will be placed into the IP routing table as well as the EIGRP Topology Table. The metric for neighbor R3 to the 192.168.100.0/24 network, also referred to as the
Reported Distance or Advertised Distance, is 10. This is less than the Feasible Distance, and so this
route meets the Feasibility Condition and is placed into the EIGRP Topology Table.
The metric for neighbor R2 to the 192.168.100.0/24 network is 30. This value is higher than the
Feasible Distance of 25. This route does not meet the Feasibility Condition and is not considered
a Feasible Successor. The route, however, is still placed into the EIGRP Topology Table. However,
the path will not be used for load sharing, even though the metric falls within the range specified
by the configuration of the variance 2 EIGRP router configuration command. In such situations,
consider using EIGRP offset-lists to ensure all routes are considered.
TROUBLESHOOTING ROUTE ADVERTISEMENT
There are times when it may seem that EIGRP is either not advertising the networks that it has
been configured to advertise or is advertising networks that it has not been configured to advertise.
For the most part, such issues are typically due to router and switch misconfigurations. There are
several reasons why EIGRP might not advertise a network that it has been configured to advertise.
Some of these reasons include the following:
•
•
•
Distribute lists
Split horizon
Summarization
Incorrectly configured distribute lists are one reason why EIGRP might not advertise a network
that it has been configured to advertise. When configuring distribute lists, ensure that all networks
that should be advertised are permitted by the referenced IP ACL or IP Prefix List.
221
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Another common issue pertaining to network advertisement when using EIGRP is the default behavior of split horizon. Split horizon is a Distance Vector protocol feature that mandates that routing information cannot be sent back out of the same interface through which it was received. This
prevents the re-advertising of information back to the source from which it was learned, effectively
preventing routing loops. This concept is illustrated in Figure 5-3 below:
192.168.1.0/26
10.1.1.0/24 received via
Se0/0. Do NOT advertise
back out of the
same interface.
HQ
Se0/0
102
Frame Relay
301
201
172.16.1.0/29
A
dv
er
tis
es
10
.1
.1
.0
/2
4
103
S1
EIGRP AS 150
10.1.1.0/24
S2
10.2.2.0/24
Fig. 5-3. EIGRP Split Horizon
The topology in Figure 5-3 illustrates a classic hub and spoke network, with router HQ as the hub
router and routers S1 and S2 as the two spoke routers. On the Frame Relay WAN, each spoke router
has a single DLCI provisioned between itself and the HQ router in a partial-mesh topology. By default, EIGRP split horizon is enabled for WAN interfaces connected to packet-switched networks,
such as Frame Relay. This means that the HQ router will not advertise routing information learned
on Serial0/0 out of the same interface.
The effect of this default behavior is that the HQ router will not advertise the 10.1.1.0/24 prefix
received from S1 to S2 because the route is received via the Serial0/0 interface and the split horizon
feature prevents the router from advertising information learned on that interface back out of the
same interface. The same is also applicable for the 10.2.2.0/24 prefix the HQ router receives from
S2. The recommended solution for this problem would be to disable the spilt horizon feature on the
WAN interface using the no ip split-horizon eigrp [asn] interface configuration command
on the HQ router.
By default, automatic summarization at the Classful boundary is enabled for EIGRP. This can be validated using the show ip protocols command. In addition to automatic summarization, EIGRP
222
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
also supports manual summarization at the interface level. Regardless of the method implemented,
summarization prevents the more specific route entries that are encompassed by the summary
from being advertised to neighbor routers. If route summarization is configured incorrectly, it may
appear that EIGRP is not advertising certain networks. For example, consider the basic network
topology that is illustrated in Figure 5-4 below:
10.1.0.0/24
10.3.0.0/24
R1
172.16.0.0/30
R3
172.16.1.0/30
EIGRP AS 150
R2
10.1.1.0/24
10.1.2.0/24
10.1.3.0/24
Fig. 5-4. EIGRP Summarization
Referencing Figure 5-4, all routers reside in EIGRP autonomous system 150. R2 is advertising the
10.1.1.0/24, 10.1.2.0/24, and 10.1.3.0/24 subnets to R1 via EIGRP. R1, which also has an interface
assigned to the 10.1.0.0/24 subnet, should in turn advertise these subnets to R3. The EIGRP configuration on router R2 has been implemented as follows:
R2(config)#router eigrp 150
R2(config-router)#network 10.1.1.0 0.0.0.255
R2(config-router)#network 10.1.2.0 0.0.0.255
R2(config-router)#network 10.1.3.0 0.0.0.255
R2(config-router)#network 172.16.1.0 0.0.0.3
R2(config-router)#no auto-summary
R2(config-router)#exit
The EIGRP configuration on R1 has been implemented as follows:
R1(config)#router eigrp 150
R1(config-router)#network 10.1.0.0 0.0.0.255
R1(config-router)#network 172.16.0.0 0.0.0.3
R1(config-router)#network 172.16.1.0 0.0.0.3
R1(config-router)#exit
223
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Finally, the EIGRP configuration on R3 has been implemented as follows:
R3(config)#router eigrp 150
R3(config-router)#network 172.16.0.0 0.0.0.3
R3(config-router)#no auto-summary
R3(config-router)#exit
After this configuration, the routing table on R2 displays the following entries:
R2#show ip route eigrp
172.16.0.0/30 is subnetted, 2 subnets
D
172.16.0.0 [90/2172416] via 172.16.1.1, 00:02:38, FastEthernet0/0
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
D
10.0.0.0/8 [90/156160] via 172.16.1.1, 00:00:36, FastEthernet0/0
The routing table on R1 displays the following entries:
R1#show ip route eigrp
172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks
D
172.16.0.0/16 is a summary, 00:01:01, Null0
10.0.0.0/8 is variably subnetted, 6 subnets, 2 masks
D
10.1.3.0/24 [90/156160] via 172.16.1.2, 00:21:01, FastEthernet0/0
D
10.3.0.0/24 [90/2297856] via 172.16.0.2, 00:00:39, Serial0/0
D
10.1.2.0/24 [90/156160] via 172.16.1.2, 00:21:01, FastEthernet0/0
D
10.1.1.0/24 [90/156160] via 172.16.1.2, 00:21:01, FastEthernet0/0
D
10.0.0.0/8 is a summary, 00:01:01, Null0
Finally, the routing table on R3 displays the following entries:
R3#show ip route eigrp
172.16.0.0/30 is subnetted, 2 subnets
D
172.16.1.0 [90/2172416] via 172.16.0.1, 00:21:21, Serial0/0
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
D
10.0.0.0/8 [90/2297856] via 172.16.0.1, 00:01:15, Serial0/0
Because summarization is enabled on R1, it appears that the EIGRP is no longer advertising the
specific subnets encompassed by the 10.0.0.0/8 summary. To allow the specific subnets to be advertised via EIGRP, automatic summarization should be disabled on R1 as illustrated below:
R1(config)#router eigrp 150
R1(config-router)#no auto-summary
R1(config-router)#exit
After this, the routing table on R3 would display the following route entries:
R3#show ip route eigrp
172.16.0.0/30 is subnetted, 2 subnets
224
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
D
D
D
D
D
172.16.1.0 [90/2172416] via 172.16.0.1, 00:00:09, Serial0/0
10.0.0.0/24 is subnetted, 5 subnets
10.1.3.0 [90/2300416] via 172.16.0.1, 00:00:09, Serial0/0
10.1.2.0 [90/2300416] via 172.16.0.1, 00:00:09, Serial0/0
10.1.1.0 [90/2300416] via 172.16.0.1, 00:00:09, Serial0/0
10.1.0.0 [90/2297856] via 172.16.0.1, 00:00:09, Serial0/0
The same would also be applicable to R2, which would now display the specific entries for the
10.1.0.0/24 and 10.3.0.0/24 subnets as follows:
R2#show ip route eigrp
172.16.0.0/30 is subnetted, 2 subnets
D
172.16.0.0 [90/2172416] via 172.16.1.1, 00:00:10, FastEthernet0/0
10.0.0.0/24 is subnetted, 5 subnets
D
10.3.0.0 [90/2300416] via 172.16.1.1, 00:00:10, FastEthernet0/0
D
10.1.0.0 [90/156160] via 172.16.1.1, 00:00:10, FastEthernet0/0
TROUBLESHOOTING STUB ROUTING ISSUES
Unless you have a software or hardware bug that actually prevents stub router functionality, the
most common problems with stub routing are prefix advertisement. As stated previously, a stub
router will advertise only connected and summary routes by default. This happens regardless of
whether any other routes (e.g., static routes) are included in the configured network statement on
the stub router. Consider the topology illustrated in Figure 5-5 below:
10.1.0.0/24
10.3.0.0/24
R1
172.16.0.0/30
172.16.1.0/30
R2
10.1.1.0/24
10.1.2.0/24
10.1.3.0/24
192.168.1.0/24 via 10.1.1.1
192.168.2.0/24 via 10.1.1.1
192.168.3.0/24 via 10.1.1.1
Fig. 5-5. EIGRP Stub Routing
225
R3
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Referencing Figure 5-5, R2 has some directly connected interfaces of 10.1.1.0/24, 10.1.2.0/24, and
10.1.3.0/24, as well as some static routes: 192.168.1.0/24, 192.168.2.0/24, and 192.168.3.0/24. The
router has been configured as an EIGRP stub router as follows:
R2(config)#router eigrp 150
R2(config-router)#network 10.1.0.0 0.0.255.255
R2(config-router)#network 192.168.0.0 0.0.255.255
R2(config-router)#eigrp stub
R2(config-router)#no auto-summary
R2(config-router)#exit
Following this configuration, R1 (the hub) displays the following entries in its routing table:
R1#show ip route
10.0.0.0/24
D
10.1.3.0
D
10.3.0.0
D
10.1.2.0
D
10.1.1.0
eigrp
is subnetted, 5 subnets
[90/156160] via 172.16.1.2, 00:09:51, FastEthernet0/0
[90/2297856] via 172.16.0.2, 00:43:49, Serial0/0
[90/156160] via 172.16.1.2, 00:09:51, FastEthernet0/0
[90/156160] via 172.16.1.2, 00:09:51, FastEthernet0/0
To resolve this, the stub router must be reconfigured to advertise static routes as follows:
R2(config)#router eigrp 150
R2(config-router)#eigrp stub connected static
R2(config-router)#exit
NOTE: If you omit the connected keyword, then only the static routes will be advertised.
After this configuration change, the routing table of R1 displays the following:
R1#show ip route eigrp
10.0.0.0/24 is subnetted, 5 subnets
D
10.1.3.0 [90/156160] via 172.16.1.2, 00:00:07, FastEthernet0/0
D
10.3.0.0 [90/2297856] via 172.16.0.2, 00:47:04, Serial0/0
D
10.1.2.0 [90/156160] via 172.16.1.2, 00:00:07, FastEthernet0/0
D
10.1.1.0 [90/156160] via 172.16.1.2, 00:00:07, FastEthernet0/0
D
192.168.1.0/24 [90/28160] via 172.16.1.2, 00:00:07, FastEthernet0/0
D
192.168.2.0/24 [90/28160] via 172.16.1.2, 00:00:07, FastEthernet0/0
D
192.168.3.0/24 [90/28160] via 172.16.1.2, 00:00:07, FastEthernet0/0
226
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
TROUBLESHOOTING SIA ISSUES
Within the EIGRP Topology Table, entries may be marked either as Passive (P) or as Active (A).
When a route is in the Passive state, it means that EIGRP has completed actively computing the
metric for the route and traffic can be forwarded to the destination network using the Successor.
This is the preferred state for all routes in the Topology Table.
EIGRP routes are in an Active state when the Successor has been lost and the router sends out a
Query packet to determine a Feasible Successor. Usually, a Feasible Successor is present and EIGRP
promotes that to the Successor. This way, the router converges without involving other routers in
the network. This process is referred to as a local computation.
However, if the Successor has been lost or removed, and there is no Feasible Successor, the router
will begin diffused computation. In diffused computation, EIGRP will send a Query packet out to
all neighbors and out of all interfaces, except for the interface to the Successor. When an EIGRP
neighbor receives a Query packet for a route and if that neighbor’s EIGRP Topology Table does not
contain an entry for the route, then the neighbor immediately replies to the Query packet with an
unreachable message, stating that there is no path for this route through this neighbor.
If the EIGRP Topology Table on the neighbor lists the router sending the Query packet as the Successor for that route, and a Feasible Successor exists, then the Feasible Successor is installed and the
router replies to the neighbor Query packet advising that it has a route to the lost destination network.
However, if the EIGRP Topology Table lists the router sending the Query packet as the Successor
for this route and there is no Feasible Successor, then the router queries all of its EIGRP neighbors,
except those sent out of the same interface as its former Successor. The router will not reply to the
Query packet until it has received a Reply to all Queries packet that it originated for this route.
Finally, if the Query packet was received from a neighbor that is not the Successor for this destination, then the router replies with its own Successor information. If the neighboring routers do not
have the lost route information, Query packets are sent from those neighboring routers to their
neighboring routers until the Query boundary is reached. The Query boundary is either the end of
the network, the distribute list boundary, or the summarization boundary.
Once the Query packet has been sent, the EIGRP router must wait for all replies to be received
before it calculates the Successor. If any neighbor has not replied within three minutes, the route is
said to be Stuck-in-Active (SIA). When a route is SIA, the neighbor relationship of the router that
did not respond to the Query packet will be reset. In such cases, you will see a message similar to
the following logged by the router or switch:
227
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
%DUAL-5-NBRCHANGE: IP-EIGRP 150:
Neighbor 150.1.1.1(Serial0/0) is down: stuck in active
%DUAL-3-SIA:
Route 172.16.100.0/24 stuck-in-active state in IP-EIGRP 150. Cleaning up
There are several reasons why the EIGRP neighbor router(s) may not respond to the Query. Some
of these reasons include the following:
•
The neighbor router’s CPU is overloaded and it cannot respond in time
•
The neighbor router itself has no information about the lost route
•
The neighbor cannot allocate memory to process the Query packet or build the Reply packet
•
Low bandwidth links are congested and packets are being delayed
•
Query packets are not received (i.e., this may be due to a bad circuit or unidirectional link)
Troubleshooting SIA issues is a difficult task, especially in larger networks. Whenever you troubleshoot SIA errors, you should answer the following two questions, listed in order of urgency, to
identify the possible causes of the SIA:
1. Why is the route active?
2. Why is the route stuck?
The first question is a difficult one to answer because you have a window of about three minutes
to determine why the route is active (i.e., why the router did not receive a response to the query
that it sent out and which router(s) did not respond to the query). Fortunately, Cisco IOS software
provides a powerful tool in the show ip eigrp topology active command. This command
shows routes that are currently active, how long those routes have been active, and which EIGRP
neighbors have and have not responded to queries.
For the second question, after you have identified the router or switch that did not respond to the
query, you can then access that device and determine why it did not respond. Keep in mind that it
may be that it was also waiting for another device. Repeat this process until you determine the root
cause (i.e., the device that is not responding to queries) and troubleshoot that device to determine
why, keeping in mind the common causes listed in the previous section.
To prevent SIA issues due to delayed responses from other EIGRP neighbors, the local router can
be configured to wait for longer than the default of three minutes to receive responses back to its
Query packets using the timers active-time command in router configuration mode. However,
keep in mind that this duration is sufficient for most networks; therefore, it is important to identify
228
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
and address what is causing SIA issues versus masking the problem by increasing the EIGRP routing wait time.
TROUBLESHOOTING ROUTE REDISTRIBUTION ISSUES
In the next section of this chapter, we will look at how to troubleshoot basic route redistribution
issues when using EIGRP. In order to troubleshoot redistribution issues effectively, you must have
a solid understanding of how redistribution into EIGRP works. By default, when external routing
information is redistributed into EIGRP, the external routes are assigned a default metric of infinity.
The only three exceptions to this rule are as follows:
1. When redistributing between two EIGRP autonomous systems
2. When redistributing static routes into EIGRP
3. When redistributing connected interfaces (subnets) into EIGRP
EIGRP preserves all metrics when redistributing between the two EIGRP autonomous systems.
When connected subnets (interfaces) and static routes are redistributed into EIGRP, the redistributed routes are assigned a default external metric value of 0. When redistributing any other external route source, such as from OSPF, for example, a metric must be specified manually for those external routes. In Cisco IOS software, this can be performed using one of three methods as follows:
1. Specifying the seed (default) EIGRP metric for redistributed routes
2. Indirectly specifying the EIGRP metric during redistribution using a route map
3. Directly specifying the EIGRP metric during route redistribution
The seed metric is the metric value that will be assigned to all the redistributed routes. In other words,
this is the initial metric that will be assigned to the external routing information when redistribution
into EIGRP is configured. The seed metric is configured using the default-metric router configuration command. Within a route-map, the set metric command can be used to specify the redistribution metric that will be used for the matched subnet(s). Finally, you can also specify the route metric during redistribution using the redistribute <protocol> metric configuration command.
Following redistribution, all EIGRP external routes are assigned an administrative distance of 170
and are printed as D EX routes in the routing table. If you have configured redistribution into
EIGRP and do not see any external routes, check the following while troubleshooting:
•
Verify that the routes are not incorrectly filtered
•
Verify that the metric has been specified
229
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
When redistributing, ensure that filters that are used during redistribution (e.g., route maps) are
configured in the correct manner and that all networks that should be redistributed are permitted in the configurations. Because redistribution configurations can quickly become very complex,
especially when redistributing at multiple points of the network, it is very important to check and
then double-check your configuration prior to and following implementation. Finally, do not forget
that EIGRP requires a metric to be specified when redistributing any routes other than EIGRP,
static and connected into EIGRP.
DEBUGGING EIGRP ROUTING ISSUES
While primary emphasis has been placed on the use of show commands in the previous sections,
this final section describes some of the debugging commands that can also be used to troubleshoot
EIGRP. Keep in mind, however, that debugging is very processor intensive and should be used only
as a last resort (i.e., after all show commands and other troubleshooting methods and tools have
been applied or attempted).
The debug ip routing [acl|static] command is a powerful troubleshooting tool and command. It should be noted, however, that while this command is not EIGRP-specific, it provides
useful and detailed information on routing table events. Following is a sample of the information
that is printed by this command:
R1#debug ip routing
IP routing debugging is on
R1#
*Mar 3 23:03:35.673: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet0/0, changed state to down
*Mar 3 23:03:35.673:
has_route: True
*Mar 3 23:03:35.677:
*Mar 3 23:03:35.677:
*Mar 3 23:03:35.677:
*Mar 3 23:03:35.677:
*Mar 3 23:03:35.677:
*Mar 3 23:03:35.689:
FastEthernet0/0
*Mar 3 23:03:35.689:
*Mar 3 23:03:35.689:
*Mar 3 23:03:35.689:
FastEthernet0/0
*Mar 3 23:03:35.689:
*Mar 3 23:03:35.689:
*Mar 3 23:03:35.689:
FastEthernet0/0
*Mar 3 23:03:35.689:
RT: is_up: FastEthernet0/0 0 state: 4 sub state: 1 line: 0
RT:
RT:
RT:
RT:
RT:
RT:
interface FastEthernet0/0 removed from routing table
del 172.16.1.0/30 via 0.0.0.0, connected metric [0/0]
delete subnet route to 172.16.1.0/30
NET-RED 172.16.1.0/30
Pruning routes for FastEthernet0/0 (3)
delete route to 10.1.3.0 via 172.16.1.2,
RT: no routes to 10.1.3.0, flushing
RT: NET-RED 10.1.3.0/24
RT: delete route to 10.1.2.0 via 172.16.1.2,
RT: no routes to 10.1.2.0, flushing
RT: NET-RED 10.1.2.0/24
RT: delete route to 10.1.1.0 via 172.16.1.2,
RT: no routes to 10.1.1.0, flushing
230
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
*Mar
3 23:03:35.693: RT: NET-RED 10.1.1.0/24
*Mar 3 23:03:35.693: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 150: Neighbor 172.16.1.2
(FastEthernet0/0) is down: interface down
*Mar 3 23:03:39.599: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 150: Neighbor 172.16.1.2
(FastEthernet0/0) is up: new adjacency
*Mar 3 23:03:40.601: %LINEPROTO-5-UPDOWN: Line protocol on Interface
FastEthernet0/0, changed state to up
*Mar 3 23:03:40.601: RT: is_up: FastEthernet0/0 1 state: 4 sub state: 1 line: 1
has_route: False
*Mar 3 23:03:40.605: RT: SET_LAST_RDB for 172.16.1.0/30
NEW rdb: is directly connected
*Mar 3 23:03:40.605: RT:
*Mar 3 23:03:40.605: RT:
*Mar 3 23:03:40.605: RT:
*Mar 3 23:03:49.119: RT:
NEW rdb: via 172.16.1.2
*Mar 3 23:03:49.119: RT:
[90/156160]
add 172.16.1.0/30 via 0.0.0.0, connected metric [0/0]
NET-RED 172.16.1.0/30
interface FastEthernet0/0 added to routing table
SET_LAST_RDB for 10.1.1.0/24
add 10.1.1.0/24 via 172.16.1.2, eigrp metric
You can use this command in conjunction with an ACL to view information about the route or
routes referenced in the ACL. Additionally, the same command can also be used for troubleshooting static route events on the local device. As a side note, instead of using this command, if you are
running EIGRP, consider using the show ip eigrp events command instead, as it provides a history of EIGRP internal events and can be used to troubleshoot SIA issues, as well as route flaps and
other events. Following is a sample of the information that is printed by this command:
R1#show ip eigrp events
Event information for AS 150:
1
23:03:49.135 Ignored route, metric: 192.168.3.0 28160
2
23:03:49.135 Ignored route, metric: 192.168.2.0 28160
3
23:03:49.135 Ignored route, metric: 192.168.1.0 28160
4
23:03:49.131 Rcv EOT update src/seq: 172.16.1.2 85
5
23:03:49.127 Change queue emptied, entries: 3
6
23:03:49.127 Ignored route, metric: 192.168.3.0 28160
7
23:03:49.127 Ignored route, metric: 192.168.2.0 28160
8
23:03:49.127 Ignored route, metric: 192.168.1.0 28160
9
23:03:49.127 Metric set: 10.1.3.0/24 156160
10
23:03:49.127 Update reason, delay: new if 4294967295
11
23:03:49.127 Update sent, RD: 10.1.3.0/24 4294967295
12
23:03:49.127 Update reason, delay: metric chg 4294967295
13
23:03:49.127 Update sent, RD: 10.1.3.0/24 4294967295
14
23:03:49.123 Route install: 10.1.3.0/24 172.16.1.2
15
23:03:49.123 Find FS: 10.1.3.0/24 4294967295
16
23:03:49.123 Rcv update met/succmet: 156160 128256
17
23:03:49.123 Rcv update dest/nh: 10.1.3.0/24 172.16.1.2
231
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
18
19
20
21
22
23:03:49.123
23:03:49.123
23:03:49.123
23:03:49.123
23:03:49.123
Metric
Metric
Update
Update
Update
set: 10.1.3.0/24 4294967295
set: 10.1.2.0/24 156160
reason, delay: new if 4294967295
sent, RD: 10.1.2.0/24 4294967295
reason, delay: metric chg 4294967295
...
[Truncated Output]
In addition to the debug ip routing command, two additional EIGRP-specific debugging commands are also available in Cisco IOS software. The debug eigrp command can be used to provide
real-time information on the DUAL Finite State Machine, EIGRP neighbor relationships, Non-Stop
Forwarding events, packets, and transmission events. The options that are available with this command are illustrated below:
R1#debug eigrp ?
fsm
EIGRP
neighbors EIGRP
nsf
EIGRP
packets
EIGRP
transmit
EIGRP
Dual Finite State Machine events/actions
neighbors
Non-Stop Forwarding events/actions
packets
transmission events
In addition to the debug eigrp command, the debug ip eigrp command prints detailed information on EIGRP route events, such as how EIGRP processes incoming updates. The additional
keywords that can be used in conjunction with this command are illustrated below:
R1#debug ip eigrp ?
<1-65535>
Autonomous System
neighbor
IP-EIGRP neighbor debugging
notifications IP-EIGRP event notifications
summary
IP-EIGRP summary route processing
vrf
Select a VPN Routing/Forwarding instance
<cr>
In conclusion, following is a sample output of the debug ip eigrp command:
R1#debug ip eigrp
IP-EIGRP Route Events debugging is on
R1#
*Mar 3 23:49:47.028: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 150: Neighbor 172.16.1.2
(FastEthernet0/0) is up: new adjacency
*Mar 3 23:49:47.044: IP-EIGRP(Default-IP-Routing-Table:150): 10.1.0.0/24 - do
advertise out FastEthernet0/0
*Mar 3 23:49:47.044: IP-EIGRP(Default-IP-Routing-Table:150): Int 10.1.0.0/24
metric 128256 - 256 128000
*Mar 3 23:49:48.030: %LINEPROTO-5-UPDOWN: Line protocol on Interface
232
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
FastEthernet0/0, changed state to up
*Mar 3 23:49:56.179: IP-EIGRP(Default-IP-Routing-Table:150):
incoming UPDATE packet
*Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150):
incoming UPDATE packet
*Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150):
156160 - 25600 130560 SM 128256 - 256 128000
*Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150):
for 10.1.1.0 ()
*Mar 3 23:49:56.544: IP-EIGRP(Default-IP-Routing-Table:150):
156160 - 25600 130560 SM 128256 - 256 128000
*Mar 3 23:49:56.548: IP-EIGRP(Default-IP-Routing-Table:150):
for 10.1.2.0 ()
*Mar 3 23:49:56.548: IP-EIGRP(Default-IP-Routing-Table:150):
156160 - 25600 130560 SM 128256 - 256 128000
Processing
Processing
Int 10.1.1.0/24 M
route installed
Int 10.1.2.0/24 M
route installed
Int 10.1.3.0/24 M
...
[Truncated Output]
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Enhanced Interior Gateway Protocol Overview
•
EIGRP is a Cisco-proprietary advanced Distance Vector routing protocol
•
EIGRP runs directly over IP using protocol number 88
•
EIGRP uses several different types of packets, which include the following:
1. Hello Packets
2. Acknowledgement Packets
3. Update Packets
4. Query Packets
5. Reply Packets
•
EIGRP supports both dynamic and static (manually configured) neighbor discovery
•
Dynamically discovered neighbors use Multicast for communication
•
Statically defined neighbors use Unicast for communication
•
EIGRP uses different Hello and Hold timers for different types of media
•
Hello timers are used to determine the interval rate EIGRP Hello packets are sent
•
The Hold timer determines the time before a router considers a neighbor down
•
By default, the Hold time is three (3) times the Hello interval
•
The EIGRP Neighbor Table is used to maintain state information about EIGRP neighbors
•
RTP is used to ensure that Update, Query and Reply packets are sent reliably
233
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
•
•
•
•
•
•
•
EIGRP uses a composite metric, which includes different variables, called K values
The default values for the K values are K1 = K3 = 1 and K2 = K4 = K5 = 0
The Diffusing Update Algorithm is at the crux of the EIGRP routing protocol
The best route, which is the route with the lowest Feasible Distance, is the Successor route
Feasible successor routes must first meet the Feasibility Condition
The Topology Table allows all EIGRP routers to have a consistent view of the entire network
The stub routing feature is most commonly used in hub-and-spoke networks
EIGRP supports the following two types of route summarization:
1. Automatic route summarization
2. Manual route summarization
Troubleshoo ng Neighbor Rela onships
•
The EIGRP neighbor relationship may not establish due to any of the following:
1. The neighbor routers are not on a common subnet
2. Mismatched primary and secondary subnets
3. Mismatched K Values
4. Mismatched AS Number
5. Access Control Lists are filtering EIGRP packets
6. Physical Layer Issues
7. Data Link Layer Issues
8. Mismatched Authentication Parameters
Troubleshoo ng Route Installa on
•
Some common reasons for route installation failure include the following:
1. The same route is received via another protocol with a lower AD
2. EIGRP summarization
3. Duplicate Router IDs are present within the EIGRP domain
4. The routes do not meet the Feasibility Condition
Troubleshoo ng Route Adver sement
•
There are several reasons why EIGRP might not advertise a network, including the following:
1. Distribute Lists
2. Split Horizon
3. Summarization
Troubleshoo ng Stub Rou ng Issues
•
•
•
Stub routing issues are typically due to basic router misconfigurations
Ensure that you understand what needs to be advertised
By default, stub routers advertise only connected and summary routes
234
C H A P T E R 5: T RO U B L ES H O OT I N G E I G R P
Troubleshoo ng SIA Issues
•
Within the EIGRP Topology Table, entries may either be marked as Passive or as Active
•
Routes go Active when the successor is lost and a query is sent to find a feasible successor
•
If a response to a query is not received in three minutes, the route becomes SIA
•
Some of these reasons include the following:
1. The neighbor routers CPU is overloaded and it cannot respond in time
2. The neighbor router itself has no information about the lost route
3. The neighbor cannot allocate memory to process the query or build the reply packet
4. Low bandwidth links are congested and packets are being delayed
5. Queries are not received, i.e. this may due to a bad circuit or unidirectional link
•
Whenever you troubleshoot SIA errors, answer the following two questions to troubleshoot:
1. Why is the route active?
2. Why is the route stuck?
•
Use the show ip eigrp topology active command to troubleshoot SIA issues
•
Use the timers active-time command to increase route waiting time
Troubleshoo ng Route Redistribu on Issues
•
By default, external routes redistributed into EIGRP are assigned a default metric of infinity
•
The only three exceptions to this rule are as follows:
1. When redistributing between two EIGRP autonomous systems
2. When redistributing static routes into EIGRP
3. When redistributing connected interfaces (subnets) into EIGRP
•
If routes are not being redistributed into EIGRP, check the following while troubleshooting
1. Verify that the routes are not incorrectly filtered
2. Verify that the metric has been specified
235
CHAPTER 6
Troubleshoo ng OSPF
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
O
pen Shortest Path First is an open-standard Link State routing protocol. Link State routing
protocols advertise the state of their links. When a Link State router begins operating on a net-
work link, information associated with that logical network is added to its local Link State Database
(LSDB). The local router then sends Hello messages on its operational links to determine whether
other Link State routers are operating on the interfaces as well. OSPF runs directly over Internet
Protocol using IP protocol number 89. The TSHOOT certification exam objective that is covered
in this chapter is as follows:
•
Troubleshoot OSPF
While it is not possible to delve into all potential OSPF problem scenarios, this chapter discusses
some of the most common problem scenarios when OSPF is implemented as the IGP of choice.
Prior to delving into specific OSPF troubleshooting scenarios, this chapter begins with an overview
of the OSPF routing protocol, providing a fundamental understanding of the routing protocol that
will facilitate troubleshooting OSPF networks. This chapter is divided into the following sections:
•
Open Short Path First Protocol Overview
•
Troubleshooting Neighbor Relationships
•
Troubleshooting Route Advertisement
•
Troubleshooting Route Redistribution Issues
•
Troubleshooting Route Summarization
•
Debugging OSPF Routing Issues
OPEN SHORT PATH FIRST PROTOCOL OVERVIEW
The following sections provide a brief recap of the OSPF protocol specification and operation. It
should be noted that detailed information on OSPF will not be included in the following sections.
Instead, emphasis is placed only on the core concepts that you must understand in order to troubleshoot and support OSPF networks effectively. Additional details and information can be found in
the current ROUTE study guide, which is available online.
OSPF Fundamentals
When OSPF is enabled in Cisco IOS software, operational data, configured parameters, and statistics information is stored in the four separate data structures illustrated in Figure 6-1 below:
238
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
Interface Table
Neighbor Table
Routing Information Base
Link State Database
Fig. 6-1. OSPF Data Structures
The Interface Table provides a list of all interfaces that have been enabled for OSPF. You can view
the state of the interfaces in the Interface Table by using the show ip ospf interfaces command.
This command also includes additional information, such as the network type, Hello and Dead
timers, adjacent neighbors, and any configured authentication parameters, assuming that OSPF
interface authentication has been configured.
The Neighbor Table tracks all active OSPF neighbors. The contents of this data structure can be
viewed using the show ip ospf neighbor command. This command allows you to view information about all neighbors for all processes or to view detailed information on individual neighbors,
which includes, but is not limited to, neighbor uptime and area designation.
The Link State Database (LSDB) contains information about the network topology. The LSDB is
a collection of Link State Advertisements (LSAs) for all routers and networks. Each router in the
OSPF network maintains an identical database, which ensures that all routers in the domain have
a consistent view of the overall network topology. By default, the Link State Database is refreshed
every 30 minutes; however, LSA flooding occurs whenever there is a change in the OSPF topology,
ensuring that the databases are synchronized.
The contents of the LSDB can be viewed using the show ip ospf database command. This command, when used without any additional keywords, prints information on all LSAs in the LSDB.
However, Cisco IOS software allows administrators to view detailed information on a per-LSA
basis using additional keywords, such as external, to view Type 5 LSAs, for example. Additionally,
you can also use the show ip ospf database database-summary command to view how many
of each type of LSA for each area there are in the database, and the total number of each. Following
is a sample output of this command:
R1#show ip ospf database database-summary
OSPF Router with ID (10.1.0.1) (Process ID 1)
239
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Area 0 database summary
LSA Type
Count
Delete
Maxage
Router
2
0
0
Network
0
0
0
Summary Net
2
0
0
Summary ASBR 0
0
0
Type-7 Ext
0
0
0
Prefixes redistributed in Type-7 0
Opaque Link
0
0
0
Opaque Area
0
0
0
Subtotal
4
0
0
Process 1 database summary
LSA Type
Count
Delete
Maxage
Router
2
0
0
Network
0
0
0
Summary Net
2
0
0
Summary ASBR 0
0
0
Type-7 Ext
0
0
0
Opaque Link
0
0
0
Opaque Area
0
0
0
Type-5 Ext
1
0
0
Prefixes redistributed in Type-5 1
Opaque AS
0
0
0
Total
5
0
0
Finally, the Routing Information Base (RIB) contains the results derived from the Shortest Path
First calculation, the contents of which can be verified or viewed using the show ip route [ospf]
[process ID] command. For example, to view all routes learned by OSPF process ID 1, you would
enter the following command on the router:
R1#show ip route ospf 1
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
O IA
10.3.0.3/32 [110/65] via 172.16.0.2, 00:51:12, Serial0/0
150.1.0.0/24 is subnetted, 1 subnets
O IA
150.1.1.0 [110/74] via 172.16.0.2, 00:51:12, Serial0/0
Mul -Area OSPF Fundamentals
OSPF is a hierarchical routing protocol that logically divides the network into sub-domains that are
referred to as areas. This logical segmentation is used to limit the scope of Link State Advertisement
(LSA) flooding throughout the OSPF domain. LSAs are special types of packets sent by routers
running OSPF. Different types of LSAs are used within an area and between areas. By restricting
the propagation of certain types of LSAs between areas, the OSPF hierarchical implementation effectively reduces the amount of routing protocol traffic within the OSPF network.
In a multi-area OSPF network, one area must be designated as the backbone area. The backbone
is the logical center of the OSPF network. All other non-backbone areas must be physically con-
240
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
nected to the backbone. However, because it is not always possible or feasible to have a physical
connection between a non-backbone area and the backbone, the OSPF standard allows the use of
virtual connections to the backbone, called virtual links.
Routers within each area store detailed topology information for the area in which they reside.
Within each area, one or more routers, referred to as Area Border Routers (ABRs), facilitate inter-area routing by advertising summarized routing information between the different areas. This
functionality allows for the following within the OSPF network:
•
Reduces the scope of LSA flooding throughout the OSPF domain
•
Hides detailed topology information between areas
•
Allows for end-to-end connectivity within the OSPF domain
•
Creates logical boundaries within the OSPF domain
The backbone receives summarized routing information from ABRs. Routing information is then
disseminated to all other non-backbone areas within the OSPF network. When a change to the
topology occurs, this information is disseminated throughout the entire OSPF domain, allowing
all routers in all areas to have a consistent view of the network. ABRs maintain Link State Database
(LSDB) information for all the areas to which they are connected by running the SPF algorithm for
each area to which they belong and generating Type 3 LSAs for these areas.
All routers within each area have detailed topology information pertaining to that specific area.
The internal routers exchange intra-area routing information and use the Type 3 LSA information
advertised by the ABRs to build a view of the topology outside the local area.
OSPF Network Types
OSPF uses the following default network types for different media:
•
Non-Broadcast
•
Point-to-point
•
Broadcast
•
Point-to multipoint
Non-Broadcast networks are network types that do not support natively Broadcast or Multicast
traffic. The most common example of a non-Broadcast network type is Frame Relay. Non-Broadcast
network types require additional configuration to allow for both Broadcast and Multicast support.
On such networks, OSPF elects a Designated Router (DR) and/or a Backup Designated Router
(BDR). These two routers are described later in this chapter.
241
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
OSPF-enabled routers send Hello packets every 30 seconds on non-Broadcast network types. If a
Hello packet is not received in four times the Hello interval, or 120 seconds, the neighbor router is
considered ‘dead.’
A point-to-point connection is simply a connection between only two endpoints. Examples of
point-to-point connections include physical WAN interfaces using HDLC and PPP encapsulation,
and Frame Relay and ATM point-to-point subinterfaces. No DR or BDR is elected on OSPF pointto-point network types. By default, OSPF sends Hello packets every 10 seconds on point-to-point
network types.
Broadcast network types are those that natively support Broadcast and Multicast traffic, the most
common example being Ethernet. As is the case with non-Broadcast networks, OSPF also elects a
DR and/or a BDR on Broadcast networks. By default, OSPF sends Hello packets every 10 seconds
on these network types, and a neighbor is declared ‘dead’ if no Hello packets are received within
four times the Hello interval, which is 40 seconds.
The point-to-multipoint network type is a non-default OSPF network type. In other words, this
network type must be configured manually using the ip ospf network point-to-multipoint
[non-broadcast] interface configuration command. By default, the command defaults to a Broad-
cast point-to-multipoint network type. This default network type allows OSPF to use Multicast
packets to discover neighbor routers dynamically. In addition, there is no DR/BDR election held on
Broadcast point-to-multipoint network types.
The [non-broadcast] keyword configures the point-to-multipoint network type as a non-Broadcast point-to-multipoint network. This requires static OSPF neighbor configuration, as OSPF will
not use Multicast to discover neighbor routers dynamically. This network type does not require the
election of a DR and/or a BDR router for the designated segment. The primary use of this network
type is to allow neighbor costs to be assigned to neighbors instead of using the interface-assigned
cost for routes received from all neighbors.
The point-to-multipoint network type is typically used in partial-mesh hub-and-spoke Non-Broadcast Multiple Access (NBMA) networks. However, it should also be noted that this network type
could also be specified for other network types, such as Broadcast Multi-Access networks (e.g., Ethernet). By default, OSPF sends Hello packets every 30 seconds on point-to-multipoint networks.
The default dead interval is four times the Hello interval, or 120 seconds.
242
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
OSPF Designated and Backup Designated Routers
OSPF elects a Designated Router (DR) and/or a Backup Designated Router (BDR) on Broadcast
and non-Broadcast network types. It is important to understand that the BDR is not a mandatory
component on these network types. In fact, OSPF will work just as well if only a DR is elected and
there is no BDR; however, there will be no redundancy if the DR fails, and the OSPF routers will
need to go through the election process again to elect a new DR.
On the segment, each individual non-DR/BDR router establishes an adjacency with the DR and, if
one has also been elected, the BDR, but not with any other non-DR/BDR routers on the segment.
The DR and BDR routers are fully adjacent with each other and all other routers on the segment.
Non-DR/BDR routers never complete the database exchange and never reach the Full adjacency
state with any other non-DR/BDR router on the segment.
The non-DR/BDR routers send messages and updates to the AllDRRouters Multicast group address 224.0.0.6. Only the DR/BDR routers listen to Multicast messages sent to this group address.
The DR then advertises messages to the AllSPFRouters Multicast group address 224.0.0.5. This allows all other OSPF routers on the segment to receive the updates. In order for a router to be the
Designated Router, or the Backup Designated Router, for the segment, the router must be elected.
This election is based on the following:
•
The highest router priority value
•
The highest router ID
By default, all routers have a default priority value of 1. This value can be adjusted using the ip ospf
priority <0-255> interface configuration command. The higher the priority, the greater the like-
lihood the router will be elected DR for the segment. The router with the second highest priority
will then be elected BDR. If a priority value of 0 is configured, the router will not participate in the
DR/BDR election process.
When determining the OSPF router ID (RID), Cisco IOS selects the highest IP address of configured Loopback interfaces. If no Loopback interfaces are configured, the software uses the highest
IP address of all configured physical interfaces as the OSPF RID. Cisco IOS software also allows
administrators to specify the RID manually using the router-id [address] router configuration
command.
It is important to remember that with OSPF, once the DR and the BDR have been elected, they will
remain as DR/BDR routers until a new election is held. This will never change, even if a router with
a higher RID or priority is introduced to the segment. In order for the DR/BDR to be changed, the
243
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
current DR/BDR routers must fail or must be removed, or the administrator can reset the routers’
OSPF processes manually using the clear ip ospf process command.
Establishing Adjacencies
Routers running OSPF transition through several states before establishing an adjacency. The routers exchange different types of packets during these states. This exchange of messages allows all
routers that establish an adjacency to have a consistent view of the network. Additional changes
to the current network are simply sent out as incremental updates. These different states are the
Down, Attempt, Init, 2-Way, Exstart, Exchange, Loading, and Full states.
•
The Down state is the starting state for all OSPF routers. However, the local router may also
show a neighbor in this state when no Hello packets have been received within the specified
router dead interval for that interface.
•
The Attempt state is valid only for OSPF neighbors on NBMA networks. This state means
that a Hello has been sent but no information has been received from the statically configured
neighbor; however, some effort is being made to establish an adjacency with this neighbor.
•
The Init state is reached when an OSPF router receives a Hello packet from a neighbor but
the local RID is not listed in the received Neighbor field. If OSPF Hello parameters, such as
timer values, do not match, OSPF routers will never progress beyond this state.
•
The 2-Way state indicates bidirectional communication with the OSPF neighbor(s). This
means that the local router has received a Hello packet with its own RID in the Neighbor
field and Hello packet parameters are identical on the two routers. On multi-access networks,
the DR and BDR routers are elected during this phase.
•
The Exstart state is used for the initialization of the database synchronization process. It is at this
stage that the local router and its neighbor establish which router is in charge of the database
synchronization process. The Master and Slave are elected in this state, and the first sequence
number for Database Descriptor (DBD) packet exchange is decided by the Master in this stage.
•
The Exchange state is where routers describe the contents of their databases using DBD packets. Each DBD packet sequence is explicitly acknowledged, and only one outstanding DBD
packet is allowed at a time. During this phase, Link State Record (LSR) packets are also sent
to request a new instance of the LSA. The M (More) bit is used to request missing information during this stage. When both routers have exchanged their complete databases, they will
both set the M bit to 0.
244
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
•
In the Loading state, OSPF routers build an LSR packet and a Link State Retransmission
list. LSR packets are sent to request the more recent instance of an LSA that has not been
received during the Exchange process. Updates that are sent during this phase are placed on
the Link State Retransmission list until the local router receives an acknowledgement. If the
local router also receives an LSR packet during this phase, it will respond with a Link State
Update packet that contains the requested information.
•
The Full state indicates that the OSPF neighbors have exchanged their entire databases and
both agree (i.e., have the same view of the network). Both neighboring routers in this state
add the adjacency to their local database and advertise the relationship in a Link State Update
packet. At this point, the routing tables are calculated, or recalculated if the adjacency was
reset.
In order for an OSPF adjacency to be established successfully, certain parameters on both routers
must match. These parameters include the following:
•
The interface MTU values
•
The Hello and Dead timers
•
The area ID
•
The authentication type and password
•
The stub area flag
•
IP subnet and subnet mask
OSPF LSAs and the Link State Database (LSDB)
As stated in the previous section, OSPF uses several types of Link State Advertisements. Each LSA
begins with a standard 20-byte LSA header, which contains the following:
•
Link State Age
•
Options
•
Link State Type
•
Link State ID
•
Advertising Router
•
Link State Sequence Number
•
Link State Checksum
•
Length
NOTE: These fields are described in detail in the ROUTE study guide that is available online.
Please refer to that guide for additional information on each of these fields.
245
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
While OSPF supports 11 different types of Link State Advertisements, only LSAs Type 1, 2 and, 3,
which are used to calculate internal routes, and LSAs Type 4, 5, and 7, which are used to calculate
external routes, are within the scope of the TSHOOT certification exam. These LSAs are described
in the sections that follow:
•
Type 1 (Router) LSAs are generated by each router for each area it belongs to. The Type 1
LSA lists the originating router’s RID. Each individual router will generate a Type 1 LSA
for the area in which it resides. For each Type 1 LSA, an ABR will both generate and advertise a Type 3 LSA. In other words, there is a one-to-one correlation between a Type 1
and Type 3 LSA. Type 1 LSAs are flooded within a single area and all routers within the area
receive these LSAs from all other routers in the same area.
•
OSPF uses the Network Link State Advertisement (Type 2 LSA) to advertise the routers on
the multi-access segment. This LSA is generated by the DR and is flooded only within the
area. Because the other non-DR/BDR routers do not establish adjacencies with each other,
the Network LSA allows those routers to know about the other routers on the multi-access
segment. As is the case with a Type 1 LSA, the ABR will also generate and advertise a Type 3
LSA for each Type 2 LSA.
•
The Network (Type 3) LSA is a summary of destinations outside of the local area, but within
the OSPF domain. In other words, this LSA advertises inter-area routing information. The
Network LSA does not carry any topological information. Instead, the only information contained in the LSA is an IP prefix. Type 3 LSAs are generated by ABRs and are flooded to all
adjacent areas. Each Type 3 LSA matches a single Router or Network LSA on a one-for-one
basis. In other words, a Type 3 LSA exists for each individual Type 1 and Type 2 LSAs. Type
3 LSAs are advertised from a non-backbone area to the OSPF backbone for intra-area routes
(i.e., for Type 1 and Type 2 LSAs), and they are advertised from the OSPF backbone to other
non-backbone areas for both intra-area (i.e., Area 0) Type 1 and Type 2 LSAs and inter-area
routes (i.e., for Type 3 LSAs) flooded into the backbone by other ABRs.
•
The Type 4 LSA describes information about an Autonomous System Boundary Router
(ASBR). This LSA contains the same packet format as the Type 3 LSA and performs the same
basic functionality, with some notable differences. Like the Type 3 LSA, the Type 4 LSA is
generated by the ABR. For both LSAs, the advertising router field contains the RID of the
ABR that generated the Summary LSA. However, the Type 4 LSA is created by the ABR for
each ASBR reachable by a Router LSA. The ABR then injects the Type 4 LSA into the appropriate area. This LSA provides reachability information on the ASBR itself.
246
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
•
The External Link State Advertisement is used to describe destinations that are external to
the autonomous system. In other words, Type 5 LSAs provide the network information necessary to reach the external networks. In addition to external routes, the default route for an
OSPF routing domain can also be injected as a Type 5 Link State Advertisement. The External LSA has a domain-flooding scope. This means that the ABR no longer stops the flooding
process but instead continues it into its respective areas. The only areas to which External
LSAs are not flooded are any stub-type areas. Before Type 5 LSAs are installed into the routing table, the router calculating the Type 5 LSA must have a Type 4 LSA for the ASBR, and
the router must know about the forwarding address contained in the Type 5 LSA.
•
The Type 7 LSA is used for external routing information from the ASBR in a not-so-stubby
area (NSSA). The external routing information within the LSA is converted by the ABR into
Type 5 LSA at the area boundary. The ABR then floods the Type 5 LSA into the OSPF domain, and other routers in the network are aware of the external networks. Type 7 LSAs have
an area flooding scope, so only routers in the NSSA receive the Type 7 LSA.
OSPF Areas
In addition to the backbone (Area 0) and other non-backbone areas, the OSPF specification also
defines several ‘special’ types of areas. The configuration of these areas is used primarily to reduce
the size of the LSDB on routers residing within those areas by preventing the injection of different
types of LSAs (primarily Type 5 LSAs) into certain areas, which include the following:
•
Not-so-stubby areas
•
Totally not-so-stubby areas
•
Stub areas
•
Totally stubby areas
Not-so-stubby areas (NSSAs) are a type of OSPF stub area that allows the injection of external routing
information by an ASBR using an NSSA External LSA (Type 7). The Type 7 LSA is used for external
routing information from the ASBR within the NSSA. The external routing information within the
LSA is converted by the ABR into a Type 5 LSA at the area boundary. The ABR then floods the Type 5
LSA into the OSPF domain, and other routers in the network are aware of the external networks. Type
7 LSAs have an area flooding scope, so only routers in the NSSA receive the Type 7 LSA.
Totally not-so-stubby areas (TNSSAs) are an extension of NSSAs. Like NSSAs, Type 5 LSAs are not
allowed into a TNSSA. However, unlike NSSAs, Summary LSAs are not allowed into a TNSSA. In
addition, when a TNSSA is configured, the default route is injected into the area as a Type 7 LSA.
TNSSAs have the following characteristics:
247
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Type 7 LSAs are converted into Type 5 LSAs at the NSSA ABR
•
They do not allow Network Summary LSAs
•
They do not allow External LSAs
Stub areas are somewhat similar to NSSAs, with the major exception being that external routes
(Type 5 or Type 7 LSAs) are not allowed into stub areas. It is important to understand that stub
functionality in OSPF and EIGRP is not at all similar. In OSPF, the configuration of an area as a stub
area reduces the size of the routing table and the OSPF database for the routers within the stub area
by preventing External LSAs from being advertised into such areas without any further configuration. Stub areas have the following characteristics:
•
The default route is injected into the stub area by the ABR as a Type 3 LSA
•
Type 3 LSAs from other areas are permitted into these areas
•
External route LSAs (i.e., Type 4 and Type 5 LSAs) are not allowed
•
The default route is injected as a Summary LSA
Totally stubby areas (TSAs) are an extension of stub areas. However, unlike stub areas, TSAs further reduce the size of the LSDB on routers in the TSA by restricting Type 3 LSAs, in addition to
the External LSAs. TSAs are typically configured on routers that have a single ingress and egress
point in to and out of the network, for example, in a traditional hub-and-spoke network. The area
routers forward all external traffic to the ABR. The ABR is also the exit point for all backbone and
inter-area traffic to the TSA. TSAs have the following characteristics:
•
The default route is injected into stub areas as a Type 3 Network Summary LSA
•
Type 3, 4, and 5 LSAs from other areas are not permitted into these areas
OSPF Virtual Links
A virtual link is a logical extension of the OSPF backbone. As we learned earlier in this chapter,
when implementing a multi-area OSPF network, one area must be designated as the backbone area
and all non-backbone areas must be connected to the backbone area. In most cases, a physical link
is used to connect the non-backbone area to the backbone area; however, this is not always possible
or feasible. In addition to being used to connect areas that have no physical connection to the OSPF
backbone, virtual links can also be used for redundancy, as well as for connecting a discontinuous
or partitioned backbone.
248
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
TROUBLESHOOTING NEIGHBOR RELATIONSHIPS
Routers running OSPF transition through several states before establishing an adjacency. These different states are the Down, Attempt, Init, 2-Way, Exstart, Exchange, Loading, and Full states. The
preferred state for an OSPF adjacency is the Full state. This state indicates that the neighbors have
exchanged their entire databases and both have the same view of the network. While the Full state
is the preferred adjacency state, it is possible that during the adjacency establishment process, the
neighbors get ‘stuck’ in one of the other states. For this reason, it is important to understand what to
look for in order to troubleshoot the issue. The following sections describe the following problems
pertaining to OSPF neighbor relationships:
•
The Neighbor Table is empty
•
The Neighbor is stuck in the ATTEMPT state
•
The Neighbor is stuck in the INIT state
•
The Neighbor is stuck in the 2WAY state
•
The Neighbor is stuck in the EXSTART/EXCHANGE state
•
The Neighbor is stuck in the LOADING state
The Neighbor Table is Empty
There are several reasons why the OSPF Neighbor Table may be empty (i.e., why the output of the
show ip ospf neighbor command might not yield any results). Common reasons are as follows:
•
Basic OSPF misconfigurations
•
Layer 1 and Layer 2 issues
•
ACL filtering
•
Interface misconfigurations
Basic OSPF misconfigurations span a broad number of things. These could include mismatched
timers, area IDs, authentication parameters, and stub configuration, for example. A plethora of
tools is available in Cisco IOS software to troubleshoot basic OSPF misconfigurations. For example,
you could use the show ip protocols command to determine information (e.g., about OSPFenabled networks); the show ip ospf command to determine area configuration and the interfaces per area; and the show ip ospf interface brief command to determine which interfaces
reside in which area, and for which OSPF process IDs those interfaces have been enabled, assuming
that OSPF has been enabled for the interface.
Another common misconfiguration is specifying the interface as passive. If this is so, then the interface will not send out Hello packets, and a neighbor relationship will not be established using
that interface. You can verify which interfaces have been configured or specified as passive using
249
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
either the show ip protocols or the show ip ospf interface commands. Following is a sample
output of the latter command on a passive interface:
R1#show ip ospf interface Serial0/0
Serial0/0 is up, line protocol is up
Internet Address 172.16.0.1/30, Area 0
Process ID 1, Router ID 10.1.0.1, Network Type POINT_TO_POINT, Cost: 64
Transmit Delay is 1 sec, State POINT_TO_POINT
Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
oob-resync timeout 40
No Hellos (Passive interface)
Supports Link-Local Signaling (LLS)
Index 1/1, flood queue length 0
Next 0x0(0)/0x0(0)
Last flood scan length is 0, maximum is 0
Last flood scan time is 0 msec, maximum is 0 msec
Neighbor Count is 0, Adjacent neighbor count is 0
Suppress hello for 0 neighbor(s)
Finally, when enabling OSPF over NBMA technologies such as Frame Relay, remember that the
neighbors must be defined statically, as OSPF does not use Multicast transmission for neighbor discovery for the default non-Broadcast network type. This is a common reason for empty Neighbor
Tables when implementing OSPF.
Layer 1 and Layer 2 issues can also result in a formation of OSPF neighbor relationships. Layer 1
and Layer 2 troubleshooting was described in detail in the previous section. Use commands such
as the show interfaces command to check for interface status (i.e., line protocol), as well as any
received errors on the interface. If the OSPF-enabled routers reside in a VLAN that spans multiple
switches, verify that there is end-to-end connectivity within the VLAN and that all ports or interfaces are in the correct Spanning Tree states, for example.
ACL filtering is another common cause for adjacencies failing to establish. It is important to be
familiar with the topology in order to troubleshoot such issues. For example, if the routers failing
to establish an adjacency are connected via different physical switches, it may be that the ACL
filtering is being implemented in the form of a VACL that has been configured on the switches for
security purposes. A useful troubleshooting tool that may indicate that OSPF packets are being
either blocked or discarded is the show ip ospf traffic command, which prints information on
transmitted and sent OSPF packets as illustrated in the output below:
R1#show ip ospf traffic Serial0/0
Interface Serial0/0
250
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
OSPF packets
Invalid
Rx: 0
Tx: 0
received/sent
Hellos
DB-des
0
0
6
0
LS-req
0
0
LS-upd
0
0
LS-ack
0
0
Total
0
6
OSPF header errors
Length 0, Auth Type 0, Checksum 0, Version 0,
Bad Source 0, No Virtual Link 0, Area Mismatch 0,
No Sham Link 0, Self Originated 0, Duplicate ID 0,
Hello 0, MTU Mismatch 0, Nbr Ignored 0,
LLS 0, Unknown Neighbor 0, Authentication 0,
TTL Check Fail 0,
OSPF LSA errors
Type 0, Length 0, Data 0, Checksum 0,
In the output above, notice that the local router is sending OSPF Hello packets but is not receiving any. If the configuration on the routers is correct, check ACLs on the routers or intermediate
devices to ensure that OSPF packets are not being filtered or discarded.
Another common reason for an empty Neighbor Table is interface misconfigurations. Similar to
EIGRP, OSPF will not establish a neighbor relationship using secondary interface addresses. However, unlike EIGRP, OSPF will also not establish a neighbor relationship if interface subnet masks
are not consistent.
EIGRP-enabled routers will establish neighbor relationships even if the interface subnet masks are
different. For example, if two routers, one with an interface using the address 10.1.1.1/24 and another with an interface using the address 10.1.1.2/30 are configured in back-to-back EIGRP implementation, they will successfully establish a neighbor relationship. However, it should be noted that
such implementations could cause routing loops between the routers. Because such implementations fall outside of the range of the TSHOOT exam requirements, they will not be described in any
further detail in this chapter. In addition to mismatched subnet masks, EIGRP-enabled routers also
ignore Maximum Transmission Unit (MTU) configurations and establish neighbor relationships
even if the interface MTU values are different. Use the show ip interfaces and show interfaces command to verify IP address and mask configuration.
The Neighbor is Stuck in the ATTEMPT State
The ATTEMPT state is valid only for OSPF neighbors on NBMA networks. This state means that a
Hello has been sent but no information has been received from the statically configured neighbor;
however, some effort is being made to establish an adjacency with this neighbor. Several possible
reasons for the adjacency to remain in this state include the following:
251
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Incorrect NBMA configuration
•
Incorrect OSPF configuration
•
ACL filtering
•
Unidirectional connectivity
Static Frame Relay mappings are susceptible to misconfigurations. When configuring OSPF over
NBMA technologies, make sure that the DLCI or PVC mappings are correct. Use the appropriate
Cisco IOS commands to verify bidirectional communication. For example, you could use the show
frame-relay pvc command to perform such validations for Frame Relay networks as follows:
R1#show frame-relay pvc 100
PVC Statistics for interface Serial0/0 (Frame Relay DCE)
DLCI = 100, DLCI USAGE = LOCAL, PVC STATUS = ACTIVE, INTERFACE = Serial0/0
input pkts 22
output pkts 18
in bytes 2050
out bytes 1722
dropped pkts 0
in pkts dropped 0
out pkts dropped 0
out bytes dropped 0
in FECN pkts 0
in BECN pkts 0
out FECN pkts 0
out BECN pkts 0
in DE pkts 0
out DE pkts 0
out bcast pkts 1
out bcast bytes 34
5 minute input rate 0 bits/sec, 1 packets/sec
5 minute output rate 0 bits/sec, 1 packets/sec
pvc create time 00:02:49, last time pvc status changed 00:01:59
NOTE: An alternative would be to use the debug
serial interface command as follows:
R1#debug serial interface
Serial network interface debugging is on
R1#
*Mar 5 10:23:38.854: Serial0/0(in): StEnq, myseq 26
*Mar 5 10:23:38.854: Serial0/0(out): Status, myseq 27,
*Mar 5 10:23:48.855: Serial0/0(in): StEnq, myseq 27
*Mar 5 10:23:48.855: Serial0/0(out): Status, myseq 28,
*Mar 5 10:23:58.855: Serial0/0(in): StEnq, myseq 28
*Mar 5 10:23:58.855: Serial0/0(out): Status, myseq 29,
*Mar 5 10:24:08.855: Serial0/0(in): StEnq, myseq 29
*Mar 5 10:24:08.855: Serial0/0(out): Status, myseq 30,
*Mar 5 10:24:18.856: Serial0/0(in): StEnq, myseq 30
*Mar 5 10:24:18.856: Serial0/0(out): Status, myseq 31,
yourseen 27, DCE up
yourseen 28, DCE up
yourseen 29, DCE up
yourseen 30, DCE up
yourseen 31, DCE up
Incorrect OSPF configuration can also result in neighbors being stuck in the ATTEMPT state. This
includes typographical errors, such as typing 172.16.1.1 instead of 172.16.11.1, for example. Check
the configuration to ensure that neighbor statements are configured correctly using the correct IP
addresses (remember to use the interface address, not the router ID).
252
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
As always, check for ACL filtering and ensure that OSPF packets are being allowed between the
endpoints. Keeping in mind that OSPF uses Unicast packets and not Multicast on NBMA networks, check for ACL or other filtering configurations that may prevent host-to-host connectivity
between the endpoints. Finally, if you suspect a unidirectional connectivity issue over the NBMA
network, use the appropriate technology show and debug commands to identify and resolve this
issue. This may include Layer 1 and Layer 2 troubleshooting.
Following is the output of the show ip ospf neighbor command showing a neighbor stuck in the
ATTEMPT/DROTHER state:
R1#show ip ospf neighbor
Neighbor ID
N/A
Pri
0
State
Dead Time
ATTEMPT/DROTHER
-
Address
172.16.0.2
Interface
Serial0/0
The Neighbor is Stuck in the INIT State
The INIT state is reached when an OSPF router receives a Hello packet from a neighbor but the local RID is not listed in the received Neighbor field. When OSPF routers are stuck in the INIT state,
it typically indicates a unidirectional communication issue; that is, the neighbor is not receiving the
Hellos sent by the local router and is therefore not including the RID of that originating or sending
router in its Hello packet(s). Some common causes for this scenario are as follows:
•
ACL filtering on one side
•
Layer 1 and Layer 2 issues
•
NBMA misconfigurations
It is possible that an ACL configured on either one of the devices could prevent the OSPF Hello
packets from being received by the local router. The result of this would be that one router would
reflect an empty Neighbor Table while the other router would be stuck in the INIT state. Consider
the topology illustrated in Figure 6-2 below:
R1
150.0.0.1/30
Serial0/0
150.0.0.2/30
Serial0/0
R3
Fig. 6-2. OSPF Neighbors Stuck in the INIT State
Figure 6-2 illustrates a basic network comprised of two routers. Assuming that all configuration parameters are correct, the routers should be able to establish an adjacency. However, we will assume
that the following ACL has been configured on R1 and applied inbound on Serial0/0:
253
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R1#show ip access-lists NETWORK-SECURITY
Extended IP access list NETWORK-SECURITY
10 deny ip host 0.0.0.0 any log
20 deny ip 127.0.0.0 0.255.255.255 any log
30 deny ip 10.0.0.0 0.255.255.255 any log
40 deny ip 172.16.0.0 0.15.255.255 any log
50 deny ip 192.168.0.0 0.0.255.255 any log
60 deny ip any 224.0.0.0 15.255.255.255 log (16 matches)
70 permit ip any any
Based on this configuration, R1 does not receive any Hello packets from R3 and will therefore show
an empty Neighbor Table, as it is unaware of that router. However, because there is no ACL applied
inbound on the Serial0/0 interface of R1, and OSPF is enabled on the 172.16.0.0/30 subnet, R3 will
receive the Hello packets from R1. The problem arises due to the fact that because R1 never receives
R3’s Hello packets, it will never include that router’s RID in the Hello packets it sends out. Given
this, R1 becomes stuck in the INIT state. These two states can be validated using the show ip ospf
neighbor command on both routers as follows:
R1#show ip ospf neighbor
R1#
R1#
The same command, if issued on R3, would display the following instead:
R3#show ip ospf neighbor
Neighbor ID
1.1.1.1
R3#
Pri
0
State
INIT/
-
Dead Time
00:00:36
Address
150.0.0.1
Interface
Serial0/0
This problem can be resolved by reconfiguring the ACL applied to R1’s Serial0/0 interface as follows:
R1(config)#ip access-list extended NETWORK-SECURITY
R1(config-ext-nacl)#no 60
R1(config-ext-nacl)#60 deny ip 224.0.0.0 15.255.255.255 any log
R1(config-ext-nacl)#
*Mar 5 11:19:57.629: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Serial0/0 from
LOADING to FULL, Loading Done
The same problem can also be caused by Layer 1 and Layer 2 issues, as well as basic NBMA misconfigurations, such as forgetting to append the broadcast keyword to the end of the frame-relay
map statements. Follow the same basic process to troubleshoot the issue.
254
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
The Neighbor is Stuck in the 2WAY State
The 2WAY state indicates bidirectional communication with the OSPF neighbor(s). This means
that the local router has received a Hello packet with its own RID in the Neighbor field and Hello
packet parameters are identical on the two routers. On multi-access networks, the DR and BDR
routers are elected during this phase. It should be noted that this state is perfectly acceptable for
non-DR/BDR routers, as they will never complete database exchange between themselves and thus
will never reach the FULL state. Consider the network illustrated in Figure 6-3 below:
Lo0: 1.1.1.1/32
Lo0: 2.2.2.2/32
R1
192.168.1.1/24
R2
192.168.1.2/24
192.168.1.3/24
192.168.1.4/24
R3
R4
Lo0: 3.3.3.3/32
Lo0: 4.4.4.4/32
Fig. 6-3. OSPF DR and BDR Fundamentals
Referencing Figure 6-3, each router on the segment establishes an adjacency with the DR and the
BDR, but not with each other. In other words, non-DR/BDR routers do not establish an adjacency
with each other. This prevents the routers on the segment from forming N(N-1) adjacencies with
each other, which reduces excessive OSPF packet flooding on the segment.
Assuming that all routers on the segment have the default priority, R4 is elected Designated Router
for the segment because it has the highest router ID. R3 is elected Backup Designated Router for
the segment because it has the second highest router ID. Because R2 and R1 are neither the DR nor
the BDR, they are referred to as DROther routers. This can be validated using the show ip ospf
neighbor command on all routers as follows:
R1#show ip ospf neighbor
Neighbor ID
Pri
State
2.2.2.2
1
2WAY/DROTHER
3.3.3.3
1
FULL/BDR
4.4.4.4
1
FULL/DR
R2#show ip ospf neighbor
Dead Time
00:00:38
00:00:39
00:00:38
255
Address
192.168.1.2
192.168.1.3
192.168.1.4
Interface
Ethernet0/0
Ethernet0/0
Ethernet0/0
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Neighbor ID Pri
State
1.1.1.1
1
2WAY/DROTHER
3.3.3.3
1
FULL/BDR
4.4.4.4
1
FULL/DR
R3#show ip ospf neighbor
Dead Time
00:00:32
00:00:33
00:00:32
Address
192.168.1.1
192.168.1.3
192.168.1.4
Interface
FastEthernet0/0
FastEthernet0/0
FastEthernet0/0
Neighbor ID Pri
State
1.1.1.1
1
FULL/DROTHER
2.2.2.2
1
FULL/DROTHER
4.4.4.4
1
FULL/DR
R4#show ip ospf neighbor
Dead Time
00:00:36
00:00:36
00:00:35
Address
192.168.1.1
192.168.1.2
192.168.1.4
Interface
FastEthernet0/0
FastEthernet0/0
FastEthernet0/0
Neighbor ID
1.1.1.1
2.2.2.2
3.3.3.3
Dead Time
00:00:39
00:00:39
00:00:30
Address
192.168.1.1
192.168.1.2
192.168.1.3
Interface
FastEthernet0/0
FastEthernet0/0
FastEthernet0/0
Pri
1
1
1
State
FULL/DROTHER
FULL/DROTHER
FULL/BDR
Notice that the DROther routers remain in the 2WAY/DROTHER state. This is normal because
they exchange their databases only with the DR and BDR routers. Therefore, because there is no full
database exchange between the DROther routers, they will never reach the FULL state.
While the 2WAY state is acceptable in scenarios such as the above, it is not fine when only two routers reside on a multi-access segment. In such cases, the cause of this problem is due to both routers
being configured with a priority of 0 via the ip ospf priority command. You can troubleshoot
this issue using the show ip ospf neighbor and show ip ospf interface commands. Following is a sample output of the show ip ospf neighbor command when a priority of 0 has been
configured for the neighbor:
R1#show ip ospf neighbor
Neighbor ID
3.3.3.3
Pri
0
State
2WAY/DROTHER
Dead Time
00:00:35
Address
150.0.0.2
Interface
FastEthernet0/0
Following is the output of the show ip ospf interface command on the same router:
R1#show ip ospf interface FastEthernet0/0
FastEthernet0/0 is up, line protocol is up
Internet Address 150.0.0.1/24, Area 0
Process ID 1, Router ID 1.1.1.1, Network Type BROADCAST, Cost: 1
Transmit Delay is 1 sec, State DROTHER, Priority 0
No designated router on this network
No backup designated router on this network
Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
oob-resync timeout 40
Hello due in 00:00:08
Supports Link-Local Signaling (LLS)
256
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
Index 1/1, flood queue length 0
Next 0x0(0)/0x0(0)
Last flood scan length is 1, maximum is 1
Last flood scan time is 0 msec, maximum is 0 msec
Neighbor Count is 1, Adjacent neighbor count is 0
Suppress hello for 0 neighbor(s)
The Neighbor is Stuck in the EXSTART/EXCHANGE State
The Exstart state is used for the initialization of the database synchronization process. It is at this
stage that the local router and its neighbor establish which router is in charge of the database synchronization process. The Master and Slave are elected in this state and the first sequence number
for DBD exchange is decided by the Master in this stage.
The Exchange state is where routers describe the contents of their databases using DBD packets.
Each DBD sequence is explicitly acknowledged, and only one outstanding DBD packet is allowed
at a time. During this phase, LSR packets are also sent to request a new instance of the LSA. The
M (More) bit is used to request missing information during this stage. When both routers have
exchanged their complete databases, they will both set the M bit to 0.
Several possible reasons for the neighbor to be stuck in the EXSTART/EXCHANGE state include
the following:
•
Mismatched MTU values
•
Duplicate RIDs
•
Broken Unicast connectivity
While EIGRP ignores interface MTU values and allows routers with different interface MTUs to
establish a neighbor relationship, the same is not applicable with OSPF. Referencing the topology
illustrated in Figure 6-2, which depicts routers R1 and R3 connected via a back-to-back Serial connection, the following shows the neighbor state on R1 assuming an MTU mismatch:
R1#show ip ospf neighbor
Neighbor ID
3.3.3.3
Pri
0
State
EXCHANGE/
-
Dead Time
00:00:38
Address
150.0.0.2
Interface
Serial0/0
Address
150.0.0.1
Interface
Serial0/0
The same command would show the following output on R3:
R3#show ip ospf neighbor
Neighbor ID
1.1.1.1
Pri
0
State
EXSTART/
-
Dead Time
00:00:39
257
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The simplest way to troubleshoot this issue is to validate interface MTU configurations by using the
show running-config interface [name] privileged EXEC command. Alternatively, interface
MTUs are also printed in the output of the show interfaces and show ip interfaces commands. Following is a sample output of the information printed by the latter:
R1#show ip interface Serial0/0
Serial0/0 is up, line protocol is up
Internet address is 150.0.0.1/30
Broadcast address is 255.255.255.255
Address determined by setup command
MTU is 1504 bytes
Helper address is not set
Directed broadcast forwarding is disabled
Multicast reserved groups joined: 224.0.0.5
Outgoing access list is not set
Inbound access list is NETWORK-SECURITY
Proxy ARP is enabled
The EXSTART/EXCHANGE state can also be caused by duplicate RIDs or broken Unicast connectivity between the routers. As is the case with mismatched MTU values, the troubleshooting
process for duplicate RIDs should begin by simply looking at device configurations to ensure that
the same RID was not entered on two different devices. The Unidirectional issues may be caused by
Layer 1 and Layer 2 issues, as well as by Layer 3 functions such as Access Control Lists. Eliminate
each layer as you troubleshoot to isolate the cause.
The Neighbor is Stuck in the LOADING State
In the Loading state OSPF routers build an LSR packet and Link State Retransmission list. LSR
packets are sent to request the more recent instance of an LSA that has not been received during
the Exchange process. Update packets that are sent during this phase are placed on the Link State
Retransmission list until the local router receives an acknowledgement. If the local router also receives an LSR packet during this phase, it will respond with a Link State Update packet that contains the requested information. It is rare to see a neighbor stuck in the LOADING state. However,
a corrupted Link State Request packet may cause this issue.
If such an event occurs, troubleshoot the issue by looking at error counters on the applicable interfaces and use a component-swapping approach to attempt to isolate the problem. Typically, such
issues are caused by faulty hardware, which may then need to be replaced.
258
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
TROUBLESHOOTING ROUTE ADVERTISEMENT
As is the case with EIGRP, there may be times when you notice that OSPF is not advertising certain
routes. For the most part, this is typically due to some misconfigurations versus a protocol failure.
Some common reasons for this include the following:
•
OSPF is not enabled on the interface(s)
•
The interface(s) is/are down
•
Interface addresses in a different area
•
OSPF misconfigurations
A common reason why OSPF does not advertise routes is that the network is not advertised via
OSPF. In current Cisco IOS versions, networks can be advertised using the network router configuration command or the ip ospf interface configuration command. Regardless of the method
used, the show ip protocols command can be used to view which networks OSPF is configured
to advertise as can be seen in the following output:
R2#show ip protocols
Routing Protocol is “ospf 1”
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
Router ID 2.2.2.2
Number of areas in this router is 1. 1 normal 0 stub 0 nssa
Maximum path: 4
Routing for Networks:
10.2.2.0 0.0.0.128 Area 1
20.2.2.0 0.0.0.255 Area 1
Routing on Interfaces Configured Explicitly (Area 1):
Loopback0
Reference bandwidth unit is 100 mbps
Routing Information Sources:
Gateway
Distance
Last Update
1.1.1.1
110
00:00:17
Distance: (default is 110)
Additionally, keep in mind that you can also use the show ip ospf interfaces command to
find out for which interfaces OSPF has been enabled, among other things. In addition to network
configuration, if the interface is down, OSPF will not advertise the route. You can use the show ip
ospf interface command to determine the interface state as follows:
R1#show ip ospf interface brief
Interface
PID
Area
Lo100
1
0
Fa0/0
1
0
IP Address/Mask
100.1.1.1/24
10.0.0.1/24
259
Cost
1
1
State Nbrs F/C
DOWN 0/0
BDR
1/1
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Referencing the output above, we can determine that Loopback100 is in a DOWN state. Taking a
closer look, we can see that the issue is because the interface has been administratively shut as illustrated in the following output:
R1#show ip ospf interface Loopback100
Loopback100 is administratively down, line protocol is down
Internet Address 100.1.1.1/24, Area 0
Process ID 1, Router ID 1.1.1.1, Network Type LOOPBACK, Cost: 1
Enabled by interface config, including secondary ip addresses
Loopback interface is treated as a stub Host
If we debugged IP routing events using the debug ip routing command and then issued the no
shutdown command under the Loopback100 interface, then we would see the following:
R1#debug ip routing
IP routing debugging is on
R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#interface Loopback100
R1(config-if)#no shutdown
R1(config-if)#end
R1#
*Mar 18 20:03:34.687: RT: is_up: Loopback100 1 state: 4 sub state: 1 line: 0
has_route: False
*Mar 18 20:03:34.687: RT: SET_LAST_RDB for 100.1.1.0/24
NEW rdb: is directly connected
*Mar 18 20:03:34.687: RT: add 100.1.1.0/24 via 0.0.0.0, connected metric [0/0]
*Mar 18 20:03:34.687: RT: NET-RED 100.1.1.0/24
*Mar 18 20:03:34.687: RT: interface Loopback100 added to routing table
...
[Truncated Output]
When multiple addresses are configured under an interface, all secondary addresses must be in
the same area as the primary address; otherwise, OSPF will not advertise these networks. As an
example, consider the network topology illustrated in Figure 6-4 below:
FastEthernet0/0
R1
FastEthernet0/0
10.0.0.2/24
10.0.0.1/24
10.0.1.1/24
10.0.2.1/24
Fig. 6-4. OSPF Secondary Subnet Advertisement
260
R2
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
Referencing Figure 6-4, routers R1 and R2 are connected via a back-to-back connection. These two
routers share the 10.0.0.0/24 subnet. However, in addition, R1 has been configured with some additional (secondary) subnets under its FastEthernet0/0 interface so that the interface configuration
on R1 is printed as follows:
R1#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 183 bytes
!
interface FastEthernet0/0
ip address 10.0.1.1 255.255.255.0 secondary
ip address 10.0.2.1 255.255.255.0 secondary
ip address 10.0.0.1 255.255.255.0
duplex auto
speed auto
end
OSPF is enabled on both R1 and R2. The configuration implemented on R1 is as follows:
R1#show running-config | section ospf
router ospf 1
router-id 1.1.1.1
log-adjacency-changes
network 10.0.0.1 0.0.0.0 Area 0
network 10.0.1.1 0.0.0.0 Area 1
network 10.0.2.1 0.0.0.0 Area 1
The configuration implemented on R2 is as follows:
R2#show running-config | section ospf
router ospf 2
router-id 2.2.2.2
log-adjacency-changes
network 10.0.0.2 0.0.0.0 Area 0
By default, because the secondary subnets have been placed into a different OSPF area on R1, they
will not be advertised by the router. This can be seen on R2, which displays the following when the
show ip route command is issued:
R2#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
261
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Gateway of last resort is not set
C
10.0.0.0/24 is subnetted, 1 subnets
10.0.0.0 is directly connected, FastEthernet0/0
To resolve this issue, the secondary subnets must also be assigned to Area 0 as follows:
R1(config)#router ospf 1
R1(config-router)#network 10.0.1.1 0.0.0.0 Area 0
*Mar 18 20:20:37.491: %OSPF-6-AREACHG: 10.0.1.1/32 changed from Area 1 to Area 0
R1(config-router)#network 10.0.2.1 0.0.0.0 Area 0
*Mar 18 20:20:42.211: %OSPF-6-AREACHG: 10.0.2.1/32 changed from Area 1 to Area 0
R1(config-router)#end
After this configuration change, the networks are now advertised to router R2 as follows:
R2#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
O
C
O
10.0.0.0/24
10.0.2.0
10.0.0.0
10.0.1.0
is subnetted, 3 subnets
[110/2] via 10.0.0.1, 00:01:08, FastEthernet0/0
is directly connected, FastEthernet0/0
[110/2] via 10.0.0.1, 00:01:08, FastEthernet0/0
In addition to the three other common causes described above, poor design, implementation, and
misconfigurations are another reason OSPF may not advertise networks as expected. Common
design issues that cause such issues include a discontiguous or partitioned backbone, and area type
misconfigurations, such as configuring areas as totally stubby, for example. For this reason, it is important to have a solid understanding of how the protocol works and how it has been implemented
in your environment. This understanding will greatly simplify the troubleshooting process, as half
the battle is already won before you even start troubleshooting the problem or issue.
262
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
TROUBLESHOOTING ROUTE REDISTRIBUTION ISSUES
Route redistribution configuration can often be very complex, especially when redistributing at
multiple points in the network. While there are a plethora of things that can go wrong if redistribution is not implemented correctly (e.g., routing loops), this section will only delve into reasons why
OSPF may not advertise external or redistributed routes. These reasons include the following:
•
Omitting the subnets keyword during redistribution
•
Incorrectly filtering outbound advertisements
When redistributing routes into OSPF, by default only Classful subnets are redistributed, unless the
subnets keyword is included in the configuration. When the subnets keyword is omitted, Cisco
IOS software prints the following configuration warning message on the console:
R1(config)#router ospf 1
R1(config-router)#redistribute eigrp 1
% Only classful networks will be redistributed
R1(config-router)#
In addition to viewing the router configuration, you can also use the show ip protocols command to determine whether subnets will be included in the redistribution. If the subnets keyword
has been omitted, then the show ip protocols command displays the following:
R1#show ip protocols
Routing Protocol is “ospf 1”
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
Router ID 1.1.1.1
It is an autonomous system boundary router
Redistributing External Routes from,
eigrp 1
Number of areas in this router is 1. 1 normal 0 stub 0 nssa
Maximum path: 4
Routing for Networks:
10.0.0.1 0.0.0.0 Area 0
10.0.1.1 0.0.0.0 Area 0
10.0.2.1 0.0.0.0 Area 0
Routing on Interfaces Configured Explicitly (Area 0):
Loopback100
Reference bandwidth unit is 100 mbps
Routing Information Sources:
Gateway
Distance
Last Update
2.2.2.2
110
00:00:03
Distance: (default is 110)
However, if the subnets keyword has been included in the redistribution configuration (e.g., the
statement redistribute eigrp 1 subnets was added to R1), then the show ip protocols
command would instead display the following:
263
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R1#show ip protocols
Routing Protocol is “ospf 1”
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
Router ID 1.1.1.1
It is an autonomous system boundary router
Redistributing External Routes from,
eigrp 1, includes subnets in redistribution
Number of areas in this router is 1. 1 normal 0 stub 0 nssa
Maximum path: 4
Routing for Networks:
10.0.0.1 0.0.0.0 Area 0
10.0.1.1 0.0.0.0 Area 0
10.0.2.1 0.0.0.0 Area 0
Routing on Interfaces Configured Explicitly (Area 0):
Loopback100
Reference bandwidth unit is 100 mbps
Routing Information Sources:
Gateway
Distance
Last Update
2.2.2.2
110
00:00:26
Distance: (default is 110)
When using distribute lists to filter OSPF routes, if an outbound distribute list is applied to an
ASBR, the ASBR will generate Type 5 LSAs only for prefixes that are explicitly permitted by the
distribute list. Only these prefixes will be advertised. To clarify this point further, a router running
OSPF has been configured with the following static routes:
R1#show ip route static
S
192.168.4.0/24 is directly
S
192.168.5.0/24 is directly
S
192.168.1.0/24 is directly
S
192.168.2.0/24 is directly
S
192.168.3.0/24 is directly
connected,
connected,
connected,
connected,
connected,
Null0
Null0
Null0
Null0
Null0
The same router currently has the following distribute list configured:
R1#show running-config | section ospf|access-list
router ospf 1
router-id 1.1.1.1
log-adjacency-changes
network 10.0.0.1 0.0.0.0 Area 0
network 10.0.1.1 0.0.0.0 Area 0
network 10.0.2.1 0.0.0.0 Area 0
distribute-list 1 out
access-list 1 permit 10.0.0.0 0.255.255.255
Next, the static routes are redistributed into OSPF as follows:
R1(config)#router ospf 1
R1(config-router)#redistribute static subnets
264
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
By default, because the distribute list references ACL 1, which permits all routes in the 10.0.0.0/8
major network, the ASBR will not generate a Type 5 LSA for 192.168.x.0/24 static routes as is
evident and seen in the following output of the show ip ospf database command on the local
router:
R1#show ip ospf database
OSPF Router with ID (1.1.1.1) (Process ID 1)
Router Link States (Area 0)
Link ID
1.1.1.1
2.2.2.2
ADV Router
1.1.1.1
2.2.2.2
Age
1140
11
Seq#
Checksum Link Count
0x8000000E 0x00FAD0 3
0x80000006 0x002ED7 1
Net Link States (Area 0)
Link ID
10.0.0.2
ADV Router
2.2.2.2
Age
938
Seq#
Checksum
0x80000003 0x003FD8
In order to allow the ASBR to generate and advertise Type 5 LSAs for the static subnets, the ACL
configuration must be modified in a manner similar to the following:
R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#access-list 1 permit 192.168.0.0 0.0.255.255
R1(config)#exit
R1#clear ip ospf redistribution
Following this configuration change, the Link State Database on router R1 displays the following:
R1#show ip ospf database
OSPF Router with ID (1.1.1.1) (Process ID 1)
Router Link States (Area 0)
Link ID
1.1.1.1
2.2.2.2
ADV Router
1.1.1.1
2.2.2.2
Age
1589
460
Seq#
Checksum Link Count
0x8000000E 0x00FAD0 3
0x80000006 0x002ED7 1
Net Link States (Area 0)
Link ID
10.0.0.2
ADV Router
2.2.2.2
Age
1387
Seq#
Checksum
0x80000003 0x003FD8
Type-5 AS External Link States
265
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Link ID
192.168.1.0
192.168.2.0
192.168.3.0
192.168.4.0
192.168.5.0
ADV Router
1.1.1.1
1.1.1.1
1.1.1.1
1.1.1.1
1.1.1.1
Age
18
18
18
18
17
Seq#
0x80000002
0x80000002
0x80000002
0x80000002
0x80000002
Checksum
0x000B26
0x00FF30
0x00F43A
0x00E944
0x00DE4E
Tag
0
0
0
0
0
TROUBLESHOOTING ROUTE SUMMARIZATION
Unlike EIGRP, OSPF does not automatically summarize at Classful boundaries. In addition, OSPF
summarization is different for internal and external routes, which often results in confusions and
misconfigurations. In Cisco IOS software, internal route summarization is configured by using
the area [area ID] range [<address> <mask> [advertise | not-advertise]] [cost
<cost>] router configuration command on the Area Border Router (ABR). Following the configu-
ration of the summary on the ABR, the following occurs:
•
An intra-area route for the summary pointing to Null0 is installed into the routing table
•
The specific Type 3 entries in the LSDB are replaced by the single Type 3 LSA
A common source of confusion with OSPF summarization is that the area…range command is
used on a non-ABR router instead of the ABR. When this happens, the command will be accepted;
however, the configured range will be flagged as passive and will not be advertised by the router.
This can be validated using the show ip ospf command as illustrated below:
R2#show ip ospf | begin Area
Area BACKBONE(0)
Number of interfaces in this area is 1
Area has no authentication
SPF algorithm last executed 00:00:59.544 ago
SPF algorithm executed 17 times
Area ranges are
Number of LSA 3. Checksum Sum 0x016581
Number of opaque link LSA 0. Checksum Sum 0x000000
Number of DCbitless LSA 0
Number of indication LSA 0
Number of DoNotAge LSA 0
Flood list length 0
Area 2
Number of interfaces in this area is 0
Area has no authentication
SPF algorithm last executed 00:00:59.548 ago
SPF algorithm executed 5 times
Area ranges are
11.0.0.0/16 Passive Advertise
Number of LSA 1. Checksum Sum 0x00032F
266
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
Number of opaque link LSA 0. Checksum Sum 0x000000
Number of DCbitless LSA 0
Number of indication LSA 0
Number of DoNotAge LSA 0
Flood list length 0
External route summarization is configured on the OSPF ASBR using the summary-address
[<address> <mask> | prefix] [not-advertise] [tag <tag>] [nssa-only] router con-
figuration command. The [<address> <mask> keywords are used to specify the summary network address and its subnet mask. After this configuration has been implemented, the following is
performed on the ASBR:
•
An intra-area route for the summary pointing to Null0 is installed into the routing table
•
The specific Type 5 entries in the LSDB are replaced by the single Type 5 LSA
A common misconfiguration is using the summary-address command on an ABR or other nonASBR router. Another common mistake is attempting to use the area…range command when
summarizing external routes. Neither of these solutions will work, as the summary-address command should be implemented only on an ASBR.
If you issue this command on an ASBR and no valid addresses belong to the range specified, the
summary will be assigned a metric of infinity (16777215) by the router. For example, assume the
following configuration was implemented on an ASBR:
R1(config)#router ospf 1
R1(config-router)#summary-address 150.0.0.0 255.255.0.0
R1(config-router)#end
Assuming that there are no known networks in the 150.0.0.0/16 range, then the summary route
is assigned the infinity metric, which can be viewed using the show ip ospf summary-address
command. Following is the output of this command on the same router:
R1#show ip ospf summary-address
OSPF Process 1, Summary-address
150.0.0.0/255.255.0.0 Metric 16777215, Type 0, Tag 0
267
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
DEBUGGING OSPF ROUTING ISSUES
In the final section of this chapter, we will look at some of the more commonly used OSPF debugging commands. OSPF debugging is enabled using the debug ip ospf command. This command
can be used in conjunction with the following additional keywords:
R1#debug ip ospf ?
adj
OSPF
database-timer OSPF
events
OSPF
flood
OSPF
hello
OSPF
lsa-generation OSPF
mpls
OSPF
nsf
OSPF
packet
OSPF
retransmission OSPF
spf
OSPF
tree
OSPF
adjacency events
database timer
events
flooding
hello events
lsa generation
MPLS
non-stop forwarding events
packets
retransmission events
spf
database tree
The debug ip ospf adj command prints real-time information on adjacency events. This is a useful troubleshooting tool when troubleshooting OSPF neighbor adjacency problems. Following is a
sample of the information that is printed by this command. The example below illustrates how this
command can be used to determine that an MTU mismatch is preventing the neighbor adjacency
from reaching the FULL state:
R1#debug ip ospf adj
OSPF adjacency events
R1#
*Mar 18 23:13:21.279:
*Mar 18 23:13:21.279:
*Mar 18 23:13:21.279:
*Mar 18 23:13:21.279:
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.283:
opt 0x52 flag 0x7 len
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.283:
*Mar 18 23:13:21.287:
0x52 flag 0x2 len 192
*Mar 18 23:13:26.275:
opt 0x52 flag 0x7 len
*Mar 18 23:13:26.279:
*Mar 18 23:13:26.279:
0x52 flag 0x2 len 192
debugging is on
OSPF: DR/BDR election on FastEthernet0/0
OSPF: Elect BDR 2.2.2.2
OSPF: Elect DR 1.1.1.1
DR: 1.1.1.1 (Id)
BDR: 2.2.2.2 (Id)
OSPF: Neighbor change Event on interface FastEthernet0/0
OSPF: DR/BDR election on FastEthernet0/0
OSPF: Elect BDR 2.2.2.2
OSPF: Elect DR 1.1.1.1
DR: 1.1.1.1 (Id)
BDR: 2.2.2.2 (Id)
OSPF: Rcv DBD from 2.2.2.2 on FastEthernet0/0 seq 0xA65
32 mtu 1480 state EXSTART
OSPF: Nbr 2.2.2.2 has smaller interface MTU
OSPF: NBR Negotiation Done. We are the SLAVE
OSPF: Send DBD to 2.2.2.2 on FastEthernet0/0 seq 0xA65 opt
OSPF: Rcv DBD from 2.2.2.2 on FastEthernet0/0 seq 0xA65
32 mtu 1480 state EXCHANGE
OSPF: Nbr 2.2.2.2 has smaller interface MTU
OSPF: Send DBD to 2.2.2.2 on FastEthernet0/0 seq 0xA65 opt
...
[Truncated Output]
268
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
From the output above, we can conclude that the MTU on the local router is larger than 1480 bytes
because the debug output shows that the neighbor has the smaller MTU value. The recommended
solution would be to adjust the smaller MTU value so that both neighbors have the same interface
MTU values. This will allow the adjacency to reach the FULL state.
The debug ip ospf lsa-generation command prints information on OSPF LSAs. This command can be used to troubleshoot route advertisement when using OSPF. Following is a sample
output of the information that is printed by this command:
R1#debug ip ospf lsa-generation
OSPF summary lsa generation debugging is on
R1#
R1#
*Mar 18 23:25:59.447: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0
from FULL to DOWN, Neighbor Down: Interface down or detached
*Mar 18 23:25:59.511: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0
from LOADING to FULL, Loading Done
*Mar 18 23:26:00.491: OSPF: Start redist-scanning
*Mar 18 23:26:00.491: OSPF: Scan the RIB for both redistribution and translation
*Mar 18 23:26:00.499: OSPF: max-aged external LSA for summary 150.0.0.0
255.255.0.0, scope: Translation
*Mar 18 23:26:00.499: OSPF: End scanning, Elapsed time 8ms
*Mar 18 23:26:00.499: OSPF: Generate external LSA 192.168.4.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001
*Mar 18 23:26:00.503: OSPF: Generate external LSA 192.168.5.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001
*Mar 18 23:26:00.503: OSPF: Generate external LSA 192.168.1.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001
*Mar 18 23:26:00.503: OSPF: Generate external LSA 192.168.2.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001
*Mar 18 23:26:00.507: OSPF: Generate external LSA 192.168.3.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000001
*Mar 18 23:26:05.507: OSPF: Generate external LSA 192.168.4.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000006
*Mar 18 23:26:05.535: OSPF: Generate external LSA 192.168.5.0, mask
255.255.255.0, type 5, age 0, metric 20, tag 0, metric-type 2, seq 0x80000006
The debug ip ospf spf command provides real-time information about Shortest Path First algorithm events. This command can be used in conjunction with the following keywords:
R1#debug ip ospf spf ?
external
OSPF spf external-route
inter
OSPF spf inter-route
intra
OSPF spf intra-route
statistic OSPF spf statistics
<cr>
As is the case with all debug commands, consideration should be given to factors such as the size of
the network and the resource utilization on the router before debugging SPF events. Following is a
sample of the output from the debug ip ospf spf statistic command:
269
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R1#debug ip ospf spf statistic
OSPF spf statistic debugging is on
R1#
R1#clear ip ospf process
Reset ALL OSPF processes? [no]: y
R1#
R1#
*Mar 18 23:37:27.795: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0
from FULL to DOWN, Neighbor Down: Interface down or detached
*Mar 18 23:37:27.859: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on FastEthernet0/0
from LOADING to FULL, Loading Done
*Mar 18
*Mar 18
*Mar 18
*Mar 18
10000ms
*Mar 18
*Mar 18
*Mar 18
*Mar 18
23:37:32.859: OSPF: Begin SPF at 28081.328ms, process time 608ms
23:37:32.859:
spf_time 07:47:56.328, wait_interval 5000ms
23:37:32.859: OSPF: End SPF at 28081.328ms, Total elapsed time 0ms
23:37:32.859:
Schedule time 07:48:01.328, Next wait_interval
23:37:32.859:
23:37:32.859:
23:37:32.859:
23:37:32.863:
Intra: 0ms, Inter: 0ms, External: 0ms
R: 2, N: 1, Stubs: 2
SN: 0, SA: 0, X5: 0, X7: 0
SPF suspends: 0 intra, 0 total
NOTE: Prior to enabling SPF debugs, consider using show commands, such as the show
ip
ospf statistics and show ip ospf commands, to troubleshoot first.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Open Short Path First Protocol Overview
•
OSPF data structures store operational data, configured parameters and statistics
•
There are four OSPF data structures as follows:
1. The Interface Table
2. The Neighbor Table
3. The Link State Database
4. The Routing Information Base
•
The Interface Table provides a list of all interfaces that have been enabled for OSPF
•
The Neighbor Table tracks all active OSPF neighbors
•
The Link State Database (LSDB) contains information about the network topology
•
The Routing Information Base (RIB) contains the results derived from the SPF calculation
•
OSPF is a hierarchical routing protocol that logically divides the network into sub-domains
•
This logical segmentation is used to limit the scope of LSA flooding
•
In a multi-area OSPF network, one area must be designated as the backbone area
•
The backbone is the logical center of the OSPF network
270
C H A P T E R 6: T RO U B L ES H O OT I N G O S P F
•
•
All other non-backbone areas must be physically or logically connected to the backbone
OSPF uses the following default network types for different media:
1. Non-Broadcast
2. Point-to-Point
3. Broadcast
4. Point-to Multipoint
•
•
•
OSPF neighbors go through several states before the adjacency becomes FULL
Valid OSPF states are Down, Attempt, Init, 2-Way, Exstart, Exchange, Loading and Full
The following parameters must match before an adjacency will go into the Full state:
1. The interface MTU values
2. The Hello and Dead Timers
3. The Area ID
4. The Authentication Type and Password
5. The Stub Area flag
6. IP Subnet and Subnet Mask
•
Each LSA begins with a standard 20-byte LSA header that contains the following:
1. Link State Age
2. Options
3. Link State Type
4. Link State ID
5. Advertising Router
6. Link State Sequence Number
7. Link State Checksum
8. Length
•
•
•
•
•
•
•
•
Type 1 LSAs are generated by each router for each area it belongs to
Type 2 LSAs advertise the routers on the multi-access segment
Type 3 LSAs are a summary of destinations outside of the local area
Type 4 LSAs describe information about an Autonomous System Boundary Router
Type 5 LSAs provide the network information necessary to reach the external networks
Type 7 LSAs are used for external routing information from the ASBR in an NSSA
OSPF supports addition special types of areas in addition to area 0 and normal areas
These special types of areas are as follows:
1. Not-so-stubby Areas
2. Totally Not-so-stubby Areas
3. Stub Areas
4. Totally Stubby Areas
271
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
A virtual link is a logical extension of the OSPF backbone
Troubleshoo ng Neighbor Rela onships
•
OSPF neighbor relationship issues fall into one of the following categories:
1. The Neighbor Table is Empty
2. The Neighbor is stuck in the ATTEMPT state
3. The Neighbor is stuck in the INIT state
4. The Neighbor is stuck in the 2WAY state
5. The Neighbor is stuck in the EXSTART/EXCHANGE state
6. The Neighbor is stuck in the LOADING state
•
The OSPF Neighbor Table may be empty due to one of the following:
1. Basic OSPF Misconfigurations
2. Layer 1 and Layer 2 Issues
3. Access List Filtering
4. Interface Misconfigurations
•
The OSPF adjacency may be stuck in the ATTEMPT state due to the following:
1. Incorrect NBMA Configuration
2. Incorrect OSPF Configuration
3. ACL Filtering
4. Unidirectional Connectivity
•
The OSPF adjacency may be stuck in the INIT state due to the following:
1. ACL Filtering on One Side
2. Layer 1 and Layer 2 Issues
3. NBMA Misconfigurations
•
The OSPF adjacency may be stuck in the 2WAY state due to the following:
1. Neighbors all have their priority values set to zero (0)
•
The OSPF adjacency may be stuck in the EXSTART/EXCHANGE state due to the following:
1. Mismatched MTU Values
2. Duplicate RIDs
3. Broken Unicast Connectivity
•
The OSPF adjacency may be stuck in the LOADING state due to the following:
1. Corrupted LSR packets—which could be due to hardware or software issues
272
CHAPTER 7
Troubleshoo ng BGP
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
B
order Gateway Protocol is first and foremost a policy control tool. Unlike traditional routing
protocols that are used to exchange routing information within an autonomous system, BGP
is traditionally used to exchange routing information between routing domains or autonomous
systems. However, BGP can also be used to exchange routing information within a single routing
domain. The TSHOOT certification exam objective that is covered in this chapter is as follows:
•
Troubleshoot eBGP
While it is not possible to delve into all potential BGP problem scenarios, this chapter discusses
some of the most common problem scenarios when using BGP. This chapter begins with an overview of BGP routing and then concludes with some common problem scenarios that pertain to
BGP. This chapter is divided into the following sections:
•
Border Gateway Protocol Overview
•
Troubleshooting Neighbor Relationships
•
Troubleshooting Route Advertisement
•
Troubleshooting Route Redistribution Issues
•
Debugging BGP Routing Issues
BORDER GATEWAY PROTOCOL OVERVIEW
Border Gateway Protocol is a Path Vector protocol that is used primarily to exchange Network
Layer Reachability Information (NLRI) between routing domains or autonomous systems. In other
words, BGP is used as an inter-domain or inter-autonomous system protocol. NLRI is exchanged
between BGP routers, referred to as BGP speakers, using UPDATE messages. The NLRI is composed of a prefix and a length. The prefix refers to the network address for that subnet, and the
length specifies the number of network bits and is simply a network mask in CIDR notation. Some
NLRI examples include 10.0.0.0/8 and 150.1.1.0/24. The sections that follow describe core BGP
characteristics with which you should be intimately familiar. It is important to remember, however,
that the primary emphasis of this guide is troubleshooting. Therefore, core BGP principles will be
described only briefly in this chapter. Additional information on BGP can be found in the current
ROUTE study guide, which is available online.
Cisco IOS So ware Border Gateway Protocol Processes
The following four BGP processes run when BGP is enabled in Cisco IOS-based devices:
1. The BGP Open process
2. The BGP I/O process
274
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
3. The BGP Scanner process
4. The BGP Router process
The BGP Open process is used for peer establishment. This process runs at initialization, when
establishing a Transmission Control Protocol (TCP) connection with a BGP peer. The BGP I/O
process handles the reading, writing, and execution of BGP messages, such as the UPDATE and
KEEPALIVE messages. This process provides the interface between TCP and BGP, reading messages from the TCP socket, placing them into the BGP input queue so that they can be processed
by the BGP Router process, and moving messages. The I/O process also moves messages in the
output queue (OutQ) to the TCP socket.
The BGP Scanner process periodically scans the BGP Routing Information Base (RIB) in order
to determine whether prefixes and attributes should be deleted and whether route maps or filter
caches should be flushed. Additionally, the BGP Scanner walks the BGP table and confirms reachability of the next-hops (i.e., it validates that next-hops are still valid). If the next-hop for a prefix is
not reachable, all BGP entries that use that next-hop are removed from the BGP RIB. By default,
the BGP Scanner runs every 60 seconds; however, this interval can be changed using CLI commands. Finally, the BGP Router process sends and receives routes, establishes peers, and interacts
with the RIB. This process is also used to calculate the BGP best path and receives commands
entered via the CLI.
The BGP Router process is the main process responsible for initiating the other BGP processes.
The three major components of the BGP Router process are as follows:
1. The BGP Routing Information Base (RIB)
2. The IP RIB for BGP-learned prefixes
3. The IP switching component for BGP-learned prefixes
The BGP RIB contains network entries, path entries, path attributes, and additional information,
such as route map and BGP filter list cache entries. The BGP-learned prefixes are stored in the IP
RIB in two types of structures, which are Network Description Blocks (NDBs) and Routing Descriptor Blocks (RDBs). An NDB is a single entry in the routing table that represents a network
prefix and contains information such as the network address, mask, and administrative distance.
The NDB is stored in the routing table with an RDB, which is used to store the actual next-hop
information. Finally, the IP switching component refers to structures such as the FIB, which is applicable when Cisco Express Forwarding (CEF) is enabled.
275
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
BGP Data Structures
In Cisco IOS software, BGP information is stored in one of two data structures or tables, which are
the Neighbor Table and the BGP Table. Both are described in the following section.
The BGP Neighbor Table contains a list of all the configured neighbors of the local BGP speaker.
This includes the IP address of the neighbor, the neighbor’s autonomous system, the adjacency
state (e.g., established, active, etc.), and other pertinent information, such as the number of prefixes
received from the neighbor. The Neighbor Table can be viewed using the show ip bgp neighbors
command. However, a summary of all neighbors is also included in the output of the show ip bgp
summary command. Following is a sample output of the show ip bgp neighbors command for a
specific neighbor:
R4#show ip bgp neighbors 3.3.3.3
BGP neighbor is 3.3.3.3, remote AS 3, external link
BGP version 4, remote router ID 3.3.3.3
BGP state = Established, up for 17:21:33
Last read 00:00:33, last write 00:00:33, hold time is 180, keepalive interval
is 60 seconds
Neighbor capabilities:
Route refresh: advertised and received(old & new)
Address family IPv4 Unicast: advertised and received
Message statistics:
InQ depth is 0
OutQ depth is 0
Sent
Rcvd
Opens:
1
1
Notifications:
0
0
Updates:
43
43
Keepalives:
1044
1044
Route Refresh:
1
0
Total:
1089
1088
Default minimum time between advertisement runs is 30 seconds
For address family: IPv4 Unicast
BGP table version 125, neighbor version 125/0
Output queue size : 0
Index 1, Offset 0, Mask 0x2
1 update-group member
Sent
Rcvd
Prefix activity:
------Prefixes Current:
3
0
Prefixes Total:
64
0
Implicit Withdraw:
7
0
Explicit Withdraw:
54
0
Used as bestpath:
n/a
0
Used as multipath:
n/a
0
Outbound
276
Inbound
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
Local Policy Denied Prefixes:
-------------AS_PATH loop:
n/a
64
Suppressed due to dampening:
6
n/a
Total:
6
64
Number of NLRIs in the update sent: max 18, min 0
Connections established 1; dropped 0
Last reset never
External BGP neighbor may be up to 2 hops away.
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 2
Local host: 4.4.4.4, Local port: 14912
Foreign host: 3.3.3.3, Foreign port: 179
Enqueued packets for retransmit: 0, input: 0
Event Timers (current time is 0x140B8DC0):
Timer
Starts
Wakeups
Retrans
1082
0
TimeWait
0
0
AckHold
1079
1038
SendWnd
0
0
KeepAlive
0
0
GiveUp
0
0
PmtuAger
0
0
DeadWait
0
0
iss: 507514249
irs: 1009588969
snduna: 507536397
rcvnxt: 1009610970
mis-ordered: 0 (0 bytes)
Next
0x0
0x0
0x0
0x0
0x0
0x0
0x0
0x0
sndnxt:
rcvwnd:
507536397
16023
sndwnd:
delrcvwnd:
16023
361
SRTT: 300 ms, RTTO: 303 ms, RTV: 3 ms, KRTT: 0 ms
minRTT: 12 ms, maxRTT: 300 ms, ACK hold: 200 ms
Flags: active open, nagle
IP Precedence value : 6
Datagrams (max data segment is 536 bytes):
Rcvd: 1121 (out of order: 0), with data: 1079, total data bytes: 22000
Sent: 2160 (retransmit: 0, fastretransmit: 0, partialack: 0, Second Congestion:
0), with data: 1081, total data bytes: 22147
The BGP Table or BGP RIB contains all routes injected into BGP on the local BGP speaker, as well
as those received from internal and external peers. The information stored in the BGP Table or RIB
includes NEXT_HOP information, AS_PATH information, and MED information, among other
parameters. Following is a sample output of the BGP RIB that can be viewed using the show ip
bgp command:
R3#show ip bgp
BGP table version is 136, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
277
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
*>
*>
*>
*>
*>
*>
Network
150.2.0.0/24
150.3.0.0/24
150.4.0.0/24
150.5.0.0/24
150.9.0.0/24
160.0.0.0/24
Next Hop
4.4.4.4
4.4.4.4
4.4.4.4
4.4.4.4
4.4.4.4
4.4.4.4
Metric LocPrf Weight Path
0
0 4 5 6 7 ?
0
0 4 5 6 7 ?
0
0 4 8 9 1 ?
0
0 4 8 9 1 ?
0
0 4 6 7 8 ?
0
0 4 6 7 8 ?
Border Gateway Protocol Messages
All Border Gateway Protocol messages share a common header that is 19-bytes long. Only four
BGP messages are available as follows:
1. The OPEN message
2. The UPDATE message
3. The NOTIFICATION message
4. The KEEPALIVE message
The OPEN message, BGP message Type 1, is the first packet BGP sends to a peer after the TCP
connection has been established. It allows the two peers to negotiate the parameters of the peer
session. The different parameters include the BGP version, the hold time value for the session,
authentication data, refresh capabilities, and support for multiple NLRI. If the OPEN message is
acceptable, a KEEPALIVE message confirming the OPEN message is sent back. Once the OPEN
message is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION messages may be exchanged.
The OPEN message includes parameters such as autonomous system number, authentication information (MD5), if configured, the BGP router ID (RID), and the hold time value.
The BGP UPDATE message (Type 2) is used to send and withdraw BGP routing information or
NLRI. Additionally, the UPDATE message contains information previously advertised by the local
router that is no longer valid, as well as new information that is being advertised to the remote peer.
Each UPDATE message contains a single set of BGP attributes and all of the routes using those
attributes. The format of this message reduces the total number of packets that routers must send
between the BGP peers when exchanging NLRI.
The BGP NOTIFICATION message (Type 3) is sent when an error condition is detected. When a
BGP peer detects an error within the session, it sends a NOTIFICATION message to the remote
router and immediately closes both the BGP and TCP sessions. The minimum length of the NOTIFICATION message is 21 bytes, including the header. Within the NOTIFICATION message, the
278
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
1-byte Error Code field specifies the type of BGP error seen by the local router. Six error codes have
been defined as follows:
•
Error Code 1: Message Header Error
•
Error Code 2: OPEN Message Error
•
Error Code 3: UPDATE Message Error
•
Error Code 4: Hold Time Expired
•
Error Code 5: BGP Finite State Machine Error
•
Error Code 6: Cease
The KEEPALIVE message (Type 4) is used to determine whether a host or a link has failed. This
message type contains a single 19-byte header and no other data. By default, BGP peers exchange
KEEPALIVE messages every 60 seconds. In addition, by default, the hold time value used by BGP
is three times the value of the keepalive interval.
The advertisement of an UPDATE message within the keepalive period resets the timer to 0. In other
words, the KEEPALIVE message is sent only in the absence of other messages for a particular session.
If the local router does not receive a KEEPALIVE or UPDATE message within the hold time period, a
NOTIFICATION message of ‘Hold Time Expired’ is generated and the session is torn down.
Establishing Border Gateway Protocol Adjacencies
Because BGP is unique in that it uses TCP as the underlying protocol, the process of establishing
a neighbor relationship is two-fold: the first phase is the establishment of the TCP session, and the
second phase is the establishment of the BGP peer session. RFC 1771 includes a section on the BGP
Finite State Machine (FSM). The FSM includes an overview of BGP operations by state. The different states BGP will go through before a neighbor relationship is established are as follows:
•
The Idle state
•
The Connect state
•
The Active state
•
The OpenSent state
•
The OpenConfirm state
•
The Established state
The first three states pertain to the establishment of the underlying TCP connection between the
BGP speakers. The second three states pertain to the establishment of the actual BGP session. The
show ip bgp summary or the show ip bgp neighbors commands can be used to view some, not
all, of these states when BGP is enabled on Cisco IOS software routers and switches.
279
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The Idle state is the initial BGP state after BGP is enabled on a router, or when a router is reset. No
BGP resources are allocated to the peer in this state. Additionally, when in this state, no incoming
connections are allowed. Following is the output of the show ip bgp neighbors command immediately after BGP has been enabled and a neighbor has been defined.
R2#show ip bgp neighbors
BGP neighbor is 10.0.1.1, remote AS 1, external link
BGP version 4, remote router ID 0.0.0.0
BGP state = Idle
Last read 00:00:00, last write 00:00:00, hold time is 180, keepalive interval
is 60 seconds
...
[Truncated Output]
If using the show ip bgp summary command, the same state would be seen as follows:
R2#show ip bgp summary
BGP router identifier 2.2.2.2, local AS number 2
BGP table version is 1, main routing table version 1
Neighbor
10.0.1.1
V
4
AS MsgRcvd MsgSent
1
0
0
TblVer
0
InQ OutQ Up/Down
0
0 never
State/PfxRcd
Idle
In the Connect state, BGP waits for a TCP connection to be completed. If successful, the local router will send an OPEN message to the peer and the BGP state machine transitions to the OpenSent
state. However, if the TCP connection attempt fails, the local router resets the ConnectRetry timer
and transitions to the Active state. In addition, depending on the failure condition, the local router
could also revert back to the Idle state. Additionally, if the ConnectRetry timer reaches 0 while the
local router is in the Connect state, the timer is reset and another connection attempt is made. In
this case, the local router remains in the Connect state.
In the Active state, a TCP connection is initiated to establish a BGP neighbor relationship, also
referred to as a BGP peer relationship. In plain English, the BGP routing process tries to establish
a TCP session with the peer. If the session establishes successfully, an OPEN message is sent to the
peer, the hold time is set to a large value, and the local router transitions to the OpenSent state.
However, if the TCP session fails to establish, the local router initiates another session, sets the
ConnectRetry timer to 0, and transitions back to the Connect state. During this state, if the remote
peer attempts to establish a connection to the local router using an unexpected IP address for the
session, the local router will refuse the connection. The local router will remain in the Active state
and reset the ConnectRetry timer. If any other failures occur, the local router releases all BGP
resources associated with the connection and transitions back to the Idle state. Following is the
output of the show ip bgp neighbors command following the completion of the Connect state:
280
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
R2#show ip bgp neighbors
BGP neighbor is 10.0.1.1, remote AS 1, external link
BGP version 4, remote router ID 0.0.0.0
BGP state = Active
Last read 00:00:01, last write 00:00:01, hold time is 180, keepalive interval
is 60 seconds
...
[Truncated Output]
If using the show ip bgp summary command, the same state would be seen as follows:
R2#show ip bgp summary
BGP router identifier 2.2.2.2, local AS number 2
BGP table version is 1, main routing table version 1
Neighbor
10.0.1.1
V
4
AS MsgRcvd MsgSent
1
215
215
TblVer
0
InQ OutQ Up/Down State/PfxRcd
0
0 00:00:01 Active
After sending the OPEN message to the peer, BGP then transitions to the OpenSent state. In this
state, the local router waits for a response to the sent OPEN message. When an OPEN message is
received, all fields in the message are checked. If an error is detected, the local router will send the
peer a NOTIFICATION message and transition back to the Idle state. However, if a successful (i.e.,
error-free response) is received, the BGP state moves to OpenConfirm and BGP sends a KEEPALIVE message and sets a keepalive timer. Additionally, the previously large hold time value set in the
Active state is replaced with the new negotiated hold time value as the BGP peers negotiate and
agree on parameters for the session. Finally, if a TCP disconnect is received while in this state, the
local router terminates the BGP session, resets the ConnectRetry timer, and transitions back to the
Active state.
In the OpenConfirm state, BGP waits for a KEEPALIVE or a NOTIFICATION message. If the
local router receives a KEEPALIVE message, it transitions to the Established state. However, if a
KEEPALIVE message is not received before the negotiated hold time value expires, the local router
will send a NOTIFICATION message to the peer with the error code ‘Hold Time Expired’ and
transition to the Idle state. Additionally, if the local router receives a NOTIFICATION message
from its peer, it also will transition immediately to the Idle state. In this state, if any other failure is
detected, the local router sends a NOTIFICATION message with Error Code 5, ‘BGP Finite State
Machine Error,’ and transitions back to the Idle state. BGP goes through the same steps again to try
to establish the connection.
The final state, the Established state, is reached when the initial KEEPALIVE message is received
while BGP is in the OpenConfirm state. This is the final state of a peer relationship and designates a
281
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
fully operational connection. Two BGP peers can exchange routing information only when the Established state is reached. In the Established state BGP can exchange UPDATE, NOTIFICATION,
and KEEPALIVE messages with its peer. The Established state can be validated using the show ip
bgp neighbors command as illustrated in the output below:
R2#show ip bgp neighbors
BGP neighbor is 10.0.0.1, remote AS 1, external link
BGP version 4, remote router ID 1.1.1.1
BGP state = Established, up for 00:12:17
Last read 00:00:17, last write 00:00:17, hold time is 180, keepalive interval
is 60 seconds
Unlike the show ip bgp neighbors command, the show ip bgp summary command will not
show the Established state, which will not be listed but the number of prefixes received from the
peer(s) will be listed instead. If no prefixes are received from the peers, a value of 0 will be present.
This is illustrated in the following output on a BGP speaker with multiple peers or neighbors in the
Established state:
R2#show ip bgp summary
BGP router identifier 2.2.2.2, local AS number 2
BGP table version is 3, main routing table version 3
2 network entries using 234 bytes of memory
3 path entries using 156 bytes of memory
3/2 BGP path/bestpath attribute entries using 372 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 786 total bytes of memory
BGP activity 3/1 prefixes, 4/1 paths, scan interval 60 secs
Neighbor
10.0.0.1
10.0.1.1
10.0.3.3
V
4
4
4
AS MsgRcvd MsgSent
1
87
88
1
90
94
3
19
23
TblVer
3
3
3
InQ OutQ Up/Down State/PfxRcd
0
0 00:15:05
1
0
0 00:15:19
1
0
0 00:07:25
0
Border Gateway Protocol A ributes
Border Gateway Protocol path attributes fall into the following four categories:
1. Well-known mandatory
2. Well-known discretionary
3. Optional transitive
4. Optional non-transitive
All BGP speakers must recognize all of the well-known mandatory attributes, which must be included for all prefixes. However, discretionary attributes may or may not be included for a particu-
282
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
lar prefix. Discretionary attributes may be used based on the decision of the network administrator;
however, their use is not mandatory. BGP speakers do not have to understand optional attributes
but must re-advertise them based on their transitive setting. Transitive attributes are advertised to
all BGP peers, while non-transitive attributes may be discarded if the local router does not recognize them. Although BGP does support multiple path attributes, this guide will discuss only the
following:
•
ORIGIN
•
AS_PATH
•
NEXT_HOP
•
MED
•
LOCAL_PREF
•
WEIGHT
NOTE: Additional detailed information on these and other BGP attributes may be found in
the current ROUTEstudy guide, which is presently available online.
The ORIGIN attribute is a well-known mandatory attribute. The ORIGIN attribute is generated by
the autonomous system that originates the routing information. This attribute is defined automatically when a route or prefix is injected into BGP but may be modified using a route-map in Cisco
IOS software. There are three possible ORIGIN values, which are as follows:
1. IGP
2. EGP
3. INCOMPLETE
An ORIGIN of IGP indicates that the prefix was injected into BGP using the network command
in Cisco IOS software. Prefixes with this ORIGIN code are displayed with the letter ‘i’. These routes
are internal to the originating AS.
An ORIGIN code of EGP indicates that the prefix originated from the Exterior Gateway Protocol
(EGP). Prefixes with this ORIGIN code are displayed with the letter ‘e’ and are encoded with a value
of 1. EGP is beyond the scope of the TSHOOT certification exam and is not described in this guide.
Finally, an ORGIN code of INCOMPLETE indicates that the original source or the prefix is not
known to the router injecting the route into BGP. This code is used for prefixes that are redistributed into BGP using the redistribute command. In Cisco IOS software, prefixes with the INCOMPLETE attribute code are displayed with the question mark (?).
283
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The AS_PATH attribute is used to prevent routing information loops in inter-AS BGP (i.e., eBGP)
implementations. The AS_PATH attribute contains a reverse-order-sequenced list of autonomous
system numbers that represent the domains the prefix has transited. This attribute is changed only
when an UPDATE message is sent to an eBGP neighbor, but not when sent to an iBGP peer, hence,
the reason that this attribute is applicable only for external BGP implementations.
When an eBGP speaker receives UPDATE messages, it looks at the AS_PATH list to determine the
best (shortest) path to the destination prefix. However, if the eBGP speaker notices its autonomous
system number in the AS_PATH list, it ignores this UPDATE message and does not take it into
consideration in the path selection process. This prevents the router from receiving and accepting
information that it originated, or was originated within its own autonomous system.
The NEXT_HOP attribute, Attribute Type Code 3, is a well-known mandatory attribute that is
used to define the next-hop IP address to the destination prefix from the BGP perspective. Unlike
with traditional IGPs, the next-hop for a BGP prefix does not have to be directly connected. In such
cases, the local router performs a recursive lookup in the routing table to locate a route to the BGP
next-hop. The result of this recursive lookup is the physical next-hop assigned to the BGP route in
the routing and forwarding tables. There are variable ways in which the NEXT_HOP for a prefix is
determined and set, which include the following:
•
When the prefix is first injected into BGP
•
When the prefix is advertised via eBGP
•
When the next-hop is manually changed
When a prefix is first injected into BGP, the BGP speaker on which the prefix is injected will be
responsible for setting the NEXT_HOP attribute. The actual value (i.e., the actual IP address specified) depends on how the prefix is injected into BGP as follows:
•
If the prefix is injected into BGP using the network command and the prefix is a directly connected subnet, the NEXT_HOP address will be 0.0.0.0 on the local BGP speaker.
•
If a prefix is injected into BGP using the network command and that prefix is known via an
IGP, the NEXT_HOP will contain the IP address of the IGP next-hop router.
•
If the prefix is redistributed into BGP from an IGP using the redistribute command, the
NEXT_HOP address will be set to the same next-hop address of the IGP.
•
When BGP route summarization is configured using the aggregate-address router con-
284
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
figuration command, the NEXT_HOP is set to the address of the router that is performing
the summarization when an UPDATE message is sent. If summarization is performed on the
local BGP speaker, then the NEXT_HOP address will be 0.0.0.0.
•
When the prefix is advertised via eBGP, the NEXT_HOP will automatically be set to the IP
address of the eBGP speaker that is sending the UPDATE message for the prefix.
•
If more than two eBGP peers reside on the same multi-access segment, the BGP speaker that
is advertising the prefix sets the NEXT_HOP address in the UPDATE message to the original
BGP speaker on the same segment rather than to itself.
•
By default, the NEXT_HOP is not changed when a prefix is advertised by a BGP speaker to
an iBGP peer. This can be modified using the next-hop-self command.
The MULTI_EXIT_DISC (MED) attribute is a 32-bit positive integer that is defined as Attribute Type
Code 4. In addition, MED is an optional non-transitive attribute. MED is typically used on inter-AS
links and allows BGP to choose among multiple exit points to the same neighboring AS. In other
words, MED, which is expressed as a metric value, is used as a suggestion to the peer external AS
regarding the preferred route into the local AS that is advertising the metric. The term ‘suggestion’
is used because it is not mandatory for the neighboring AS to adhere to the values specified using
this attribute. The following section describes the MED rules when using Border Gateway Protocol:
•
If the prefix is for a directly connected network and the network or redistribute command is used to inject the prefix into BGP, the BGP MED value is set to 0.
•
If the prefix is received from an IGP, when it is injected into BGP using the network or
redistribute command, the MED value is set to the IGP metric.
•
If the prefix is injected into BGP using the aggregate-address command (i.e., it is a summary prefix), the MED value is not set.
•
A BGP speaker will advertise prefixes with the same metric to another iBGP peer.
•
By default, if the prefix is learned from an iBGP peer, the edge router will remove the MED
value before advertising the prefix to an eBGP peer.
The LOCAL_PREF attribute is a 32-bit positive integer that defines a preference over one exit
point in an AS. This attribute is a well-known discretionary BGP attribute, defined as Attribute
285
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Type Code 5. The LOCAL_PREF attribute is used only within an AS for path selection manipulation. If an iBGP speaker receives an UPDATE for the same destination from multiple iBGP peers,
it will prefer the path with the highest LOCAL_PREF value. In Cisco IOS software, the default LOCAL_PREF value is 100; however, this value can be changed to any value between 0 and 4294967295.
The WEIGHT attribute is used in a manner similar to the LOCAL_PREF attribute. That is, this
attribute is used to define a preference over one exit point in an AS over another. However, unlike
the LOCAL_PREF attribute, which is propagated to other routers in the AS, the Cisco WEIGHT
attribute is locally significant to the device on which it is configured – much like administrative
distance values. This attribute, therefore, does not propagate the routing policy of other neighbor
routers, nor will it be sent to other routers.
The WEIGHT is a 4-byte integer between 0 and 65,535. While this attribute is proprietary, it has
priority over all other attributes in the BGP path selection process. By default, all prefixes injected
into BGP on the local BGP speaker are assigned a WEIGHT of 32,768. Because this attribute is locally significant, this default value is not set in the UPDATE messages sent to any other neighbors.
In Cisco IOS software, the higher the WEIGHT, the more preferred the path will be.
Influencing Inbound Path Selec on
When using BGP attributes to influence inbound path selection, BGP attribute configuration must
be implemented in the outbound direction. The attributes that are used to influence the path a
neighboring autonomous system uses to take back into the originating autonomous system must
be advertised in UPDATE messages sent to that neighboring autonomous system. The two BGP
attributes that are used to influence the inbound path are as follows:
•
The MULTI_EXIT_DISC attribute
•
The AS_PATH attribute
Two attributes are used to influence the path that BGP speakers within the same autonomous
system will take to exit the autonomous system. Unlike the attributes used to influence inbound
path selection, the attributes used to influence outbound path selection are applied in the inbound
direction, but also using route maps. These two attributes are as follows:
•
The LOCAL_PREF attribute
•
The WEIGHT attribute
Having recapped the BGP fundamentals, which are also described in additional detail in the
ROUTE study guide, the following section describes some common BGP problems and provides
286
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
recommended solutions for resolving them. Keep in mind that it is not possible to go through all
potential BGP problems. Instead, emphasis will be placed only on the most common ones.
TROUBLESHOOTING NEIGHBOR RELATIONSHIPS
The most common reason for failing to establish either eBGP or iBGP neighbor relationships are
due to misconfigurations. However, in addition to misconfigurations, the following factors can also
prevent BGP adjacencies from establishing:
•
Access Control Lists filtering BGP packets
•
Layer 1 and Layer 2 issues
•
Device resource consumption
As with other routing protocols, ACLs can also prevent BGP neighbor adjacencies from being established. Given that BGP allows both internal and external peers to be more than one hop away,
when configuring BGP between devices that are more than a single hop away, be sure that no ACLs
on any intermediate devices are preventing TCP packets. If any such ACLs exist, they should be
modified, using TCP port 179, to allow the BGP session to be established between the desired
peers. You can verify applied ACLs using the show ip access-lists or show running-config
commands on local and intermediate devices.
Layer 1 and Layer 2 issues can also prevent BGP adjacencies from being established. You can check
interface errors using the show interfaces and show counters interface commands. Layer
2 issues, such as VLANs and STP, can be validated following the sequence of steps described in the
previous chapters. Commands that can be used to troubleshoot such issues include the show vlan
and show spanning-tree suite of commands.
Finally, it is important to remember that BGP itself is designed to support a large amount of prefixes, which in turn equates to greater resource (e.g., memory) consumption than traditional routing
protocols. Additionally, the more peers or neighbors that are configured, the more memory BGP
will consume. If device resources are already over-taxed, it is possible that some BGP adjacencies
may not be established. Use the show processes suite of commands to troubleshoot device resource utilization if you suspect that this may be preventing adjacencies.
In addition to the potential causes described above, common misconfigurations include the following:
•
Using the incorrect autonomous system number for the peer
•
Using the incorrect IP address for the peer
287
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Incorrect authentication parameters
•
No IP connectivity between indirectly connected peers
Using the Incorrect Autonomous System Number for the Peer
A common BGP misconfiguration is specifying the incorrect autonomous system number for the
peer. If the neighbor [address] remote-as [autonomous system] command does not match
the autonomous system number configured on the remote peer using the router bgp [autonomous system] command, the adjacency will not be established. Instead, Cisco IOS software will
print the following error message on the console:
*Mar 12 16:58:29.991: %BGP-3-NOTIFICATION: sent to neighbor 10.0.0.1 2/2 (peer
in wrong AS) 2 bytes 0001 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 002D 0104 0001
00B4 0101 0101 1002 0601 0400 0100 0102 0280 0002 0202 00
On the remote router, the following error message will be printed on the console:
*Mar 19 04:58:09.535: %BGP-3-NOTIFICATION: received from neighbor 10.0.0.3 2/2
(peer in wrong AS) 2 bytes 0001
In such cases, the BGP speakers will remain in the Active state, indicating that that they are both
actively trying to establish a TCP session with the specified peer. This issue can be resolved by
specifying the correct autonomous system number.
Using the Incorrect IP Address for the Peer
While specifying an incorrect autonomous system number for the peer will result in error messages being printed on the console, if you specify the incorrect IP address, no error messages will be
printed. It is important to ensure that you double-check that the IP address that has been specified
for the peer is indeed the IP address of that device.
If the peers are indirectly connected (i.e., will be peering using Loopback or other interfaces), then
it is important to ensure that you specify the correct IP address. This address should match the address of the interface used in the update-source [interface] configuration command. In the
event that these parameters are mismatched, the TCP session will never be established.
Incorrect Authen ca on Parameters
Border Gateway Protocol supports Message Digest 5 (MD5) authentication, which is used to secure or verify the security of the TCP segments between two BGP peers. When the BGP TCP MD5
shared password is configured between two peers, the Cisco IOS software checks the MD5 digest
of every segment sent on the TCP connection. If MD5 authentication is invoked and a segment
fails authentication, then an error message will be displayed. The type of message printed varies
288
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
depending on whether authentication is enabled on only one of the peers or whether both peers
are configured for authentication but are using different passwords. In the event that authentication is configured on the local BGP speaker but not the peer, the following error message will be
printed on the console of the local BGP speaker:
*Mar 12 17:06:27.235: %TCP-6-BADAUTH: No MD5 digest from 1.0.0.1(179) to
1.0.0.3(46132)
*Mar 12 17:06:27.239: %TCP-6-BADAUTH: No MD5 digest from 1.0.0.1(179) to
1.0.0.3(46132)
However, if authentication is configured on both peers, but the passwords are mismatched the following error message will be printed:
*Mar 12 17:08:56.243: %TCP-6-BADAUTH: Invalid MD5 digest from 1.0.0.1(52991) to
1.0.0.3(179)
*Mar 12 17:09:04.243: %TCP-6-BADAUTH: Invalid MD5 digest from 1.0.0.1(52991) to
1.0.0.3(179)
In order to resolve this issue, the same password must be used when configuring authentication
between the BGP peers. This is applicable to both internal and external BGP peers.
No IP Connec vity between Indirectly Connected Peers
While configuring BGP between directly connected peers is a straightforward task, additional configuration is required to ensure that BGP adjacencies between indirectly configured peers are established. Consider the basic network topology that is illustrated in Figure 7-1 below, for example:
Lo0: 3.3.3.3/32
AS 3
R3
Lo0: 4.4.4.4/32
Se1/3
Se0/0
Se1/3
Se0/1
R4
AS 4
Fig. 7-1. Understanding BGP Multihop Implementation
Referencing Figure 7-1, external BGP is to be configured between R3 and R4 using the Loopback
interfaces of either router for peering, allowing for load balancing across the physical links. When
implementing BGP in such situations, several additional configuration commands are required.
The first requirement is the use of the ebgp-multihop command. By default, external BGP packets
are sent out with an IP TTL of 1. The ebgp-multihop command allows administrators to modify
this default behavior and specify the packet TTL value. If a value is not specified, the default TTL
of 255 will be used instead.
289
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The second requirement is ensuring that there is IP connectivity between the Loopback interfaces.
This may be performed using either dynamic or static routes; with the latter being the most commonly used method. Referencing the topology illustrated in Figure 7-1, two static routes can be
configured between the routers across the Serial0/0 interfaces using the ip route [remote loopback address] [mask] serial [name/number] global configuration command.
The third requirement is specifying the update source. By default, BGP expects that the update
source will be an IP address on a directly connected subnet. If the specified neighbor address is indirectly connected (e.g., a Loopback), then the update-source [interface] BGP configuration
command must also be specified on both speakers. If any of these parameters are not specified, the
adjacency will not be established. This can be validated using the show ip bgp neighbors [address] command as follows:
R4#show ip bgp neighbors 3.3.3.3
BGP neighbor is 3.3.3.3, remote AS 3, external link
BGP version 4, remote router ID 0.0.0.0
BGP state = Active
Last read 00:00:14, last write 00:00:14, hold time is 180, keepalive interval
is 60 seconds
Message statistics:
InQ depth is 0
OutQ depth is 0
Sent
Rcvd
Opens:
1
1
Notifications:
0
0
Updates:
0
0
Keepalives:
16
16
Route Refresh:
0
0
Total:
17
17
Default minimum time between advertisement runs is 30 seconds
For address family: IPv4 Unicast
BGP table version 1, neighbor version 0/0
Output queue size : 0
Index 1, Offset 0, Mask 0x2
1 update-group member
Sent
Rcvd
Prefix activity:
------Prefixes Current:
0
0
Prefixes Total:
0
0
Implicit Withdraw:
0
0
Explicit Withdraw:
0
0
Used as bestpath:
n/a
0
Used as multipath:
n/a
0
Local Policy Denied Prefixes:
Total:
Outbound
-------0
290
Inbound
------0
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
Number of NLRIs in the update sent: max 0, min 0
Connections established 1; dropped 1
Last reset 00:00:20, due to User reset
External BGP neighbor may be up to 2 hops away.
No active TCP connection
From the output printed above, we can determine that the ebgp-multihop 2 command has been
issued on the local router; however, the BGP session is not established. This leaves two possibilities: either the update-source command is missing or there is no IP connectivity between the two
Loopback addresses. You can troubleshoot this issue simply by verifying the device configurations
on both routers.
While Cisco IOS software assumes that external BGP peers are directly connected, the same is
not assumed for internal BGP peers. This negates the need for the multihop command when configuring internal BGP peers. However, the peer IP addresses still must have connectivity and the
update-source command still must be used. If this command is not specified, the BGP session will
never be established, as can be seen in the following output:
R4#show ip bgp neighbors 3.3.3.3
BGP neighbor is 3.3.3.3, remote AS 3, internal link
BGP version 4, remote router ID 0.0.0.0
BGP state = Active
Last read 00:02:50, last write 00:02:50, hold time is 180, keepalive interval
is 60 seconds
Message statistics:
InQ depth is 0
OutQ depth is 0
Sent
Rcvd
Opens:
0
0
Notifications:
0
0
Updates:
0
0
Keepalives:
0
0
Route Refresh:
0
0
Total:
0
0
Default minimum time between advertisement runs is 0 seconds
For address family: IPv4 Unicast
BGP table version 1, neighbor version 0/0
Output queue size : 0
Index 1, Offset 0, Mask 0x2
1 update-group member
Sent
Rcvd
Prefix activity:
------Prefixes Current:
0
0
Prefixes Total:
0
0
Implicit Withdraw:
0
0
Explicit Withdraw:
0
0
291
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Used as bestpath:
Used as multipath:
n/a
n/a
0
0
Outbound
Inbound
Local Policy Denied Prefixes:
-------------Total:
0
0
Number of NLRIs in the update sent: max 0, min 0
Connections established 0; dropped 0
Last reset never
No active TCP connection
TROUBLESHOOTING ROUTE ADVERTISEMENT
Similar to neighbor adjacency establishment, route advertisement issues are commonly due to
simple device misconfigurations. In Cisco IOS software, three methods can be used to advertise
networks when using Border Gateway Protocol. These methods include using the network [network] mask [mask] command, using the aggregate-address command, and using the redistribute [protocol] command. The first two methods are discussed in this main section;
however, redistribution will be discussed in the following main section. In addition, this section will
also discuss route advertisement following BGP policy changes.
Route Adver sement with the ‘network’ Command
The network [network] mask [mask] command is the recommended method for advertising
networks when using BGP. When specified with BGP, the behavior of this command differs from
when it is used with an IGP, such as OSPF or EIGRP. With an IGP, this command configures the
router to install the network into the Link State Database or Topology Table and send out Hello
packets to discover neighbor routers. With BGP, however, this command is used to flag the network as being local to the autonomous system, as well as to instruct BGP to advertise the specified
network. It does not configure the router to send out Hello packets out of any interfaces that fall
within the specified range.
The specified network must be present in the routing table before it will be advertised by BGP. The
mask <mask> keyword is optional and is required only when BGP is required to advertise either
subnets or supernets. For example, to advertise the 10.0.0.0/30 subnet, the mask keyword would
be required. Excluding the mask keyword and simply entering the network 10.0.0.0 command
would not result in this subnet being advertised. However, the mask keyword would not be required to advertise the 10.0.0.0/8 network, as long as there was a matching route for this prefix in
the routing table. To clarify this concept further, consider the topology illustrated in Figure 7-2
below:
292
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
Lo0: 3.3.3.3/32
AS 3
R3
Lo0: 4.4.4.4/32
Se1/3
Se0/0
Se1/3
Se0/1
R4
AS 4
10.4.4.4/24
Fig. 7-2. Understanding the BGP ‘network’ Command
Referencing Figure 7-2, an external BGP session has been configured between R3 and R4. We will
assume that the correct configuration is in place on both routers, allowing the session to be established successfully. R4 has a directly connected 10.4.4.0/24 subnet. This is to be advertised via BGP
to R3. At this point, the relevant BGP configuration on R4 is as follows:
R4#show running-config | section bgp
router bgp 4
no synchronization
bgp router-id 4.4.4.4
bgp log-neighbor-changes
network 10.0.0.0
neighbor 3.3.3.3 remote-as 3
neighbor 3.3.3.3 ebgp-multihop 2
neighbor 3.3.3.3 update-source Loopback0
no auto-summary
At first glance, the configuration appears to be correct because the 10.4.4.0/24 subnet is encompassed by the 10.0.0.0/8 Classful network, as can be seen in the output of the routing table below:
R4#show ip route 10.0.0.0 longer-prefixes
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
C
10.0.0.0/24 is subnetted, 1 subnets
10.4.4.0 is directly connected, FastEthernet0/0
However, because the operation of the network command differs for BGP, the 10.4.4.0/24 subnet
will not be added to the BGP RIB as illustrated in the following output:
R4#show ip bgp 10.0.0.0
293
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
% Network not in table
R4#
R4#show ip bgp
R4#
In order to have BGP advertise the 10.4.4.0/24 subnet, the correct mask must be specified as follows:
R4(config)#router bgp 4
R4(config-router)#no network 10.0.0.0
R4(config-router)#network 10.4.4.0 mask 255.255.255.0
R4(config-router)#exit
Following this re-configuration, the 10.4.4.0/24 subnet appears in the BGP RIB as follows:
R4#show ip bgp
BGP table version is 2, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
*> 10.4.4.0/24
Next Hop
0.0.0.0
Metric LocPrf Weight Path
0
32768 i
We can further validate that the prefix is advertised to neighbor 3.3.3.3 using either the show ip
bgp [network] [mask] or the show ip bgp neighbors [address] advertised-routes com-
mands. Following is the output of the show ip bgp [network] [mask]command on R4:
R4#show ip bgp 10.4.4.0 255.255.255.0
BGP routing table entry for 10.4.4.0/24, version 2
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
1
Local
0.0.0.0 from 0.0.0.0 (4.4.4.4)
Origin IGP, metric 0, localpref 100, weight 32768, valid, sourced, local,
best
NOTE: To verify neighbors in update-group 1, use the show
ip bgp update-group com-
mand as follows:
R4#show ip bgp update-group 1
BGP version 4 update-group 1, external, Address Family: IPv4 Unicast
BGP Update version : 2/0, messages 0
Update messages formatted 1, replicated 0
294
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
Number of NLRIs in the update sent: max 1, min 1
Minimum time between advertisement runs is 30 seconds
Has 1 member (* indicates the members currently being sent updates):
3.3.3.3
Alternatively, simply use the show ip bgp neighbors [address] advertised-routes command as previously stated. Following is the output of this command on R4:
R4#show ip bgp neighbors 3.3.3.3 advertised-routes
BGP table version is 2, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
*> 10.4.4.0/24
Next Hop
0.0.0.0
Metric LocPrf Weight Path
0
32768 i
Total number of prefixes 1
Route Adver sement with the ‘aggregate-address’ Command
Two common problem scenarios are often encountered when using the aggregate-address command with BGP. The first is that BGP does not advertise the aggregate at all and the second is that
BGP advertises the summary and the more specific routes as well. In order to troubleshoot either
issue, you must have a solid understanding of how route aggregation with BGP works. By default,
when the aggregate-address command is used, BGP advertises the summary only if a more
specific route encompassed by the summary is present in the routing table. Continuing from the
topology illustrated in Figure 7-2, assume the following configuration was implemented on R4:
R4(config)#router bgp 4
R4(config-router)#aggregate-address 10.0.0.0 255.0.0.0
R4(config-router)#end
With this configuration, and assuming that the 10.4.4.0/24 subnet is still present in the RIB, BGP
will generate and advertise the 10.0.0.0/8 summary address. This can be validated by checking the
BGP RIB using the show ip bgp command as follows:
R4#show ip bgp
BGP table version is 3, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
*> 10.0.0.0
*> 10.4.4.0/24
Next Hop
0.0.0.0
0.0.0.0
Metric LocPrf Weight Path
32768 i
0
32768 i
295
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
NOTE: You can view detailed information on the aggregate by appending the network command and/or mask keyword to the end of this command as follows:
R4#show ip bgp 10.0.0.0 255.0.0.0
BGP routing table entry for 10.0.0.0/8, version 7
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
1
Local, (aggregated by 4 4.4.4.4)
0.0.0.0 from 0.0.0.0 (4.4.4.4)
Origin IGP, localpref 100, weight 32768, valid, aggregated, local, atomicaggregate, best
In the event that all specific routes included in the aggregate are removed or withdrawn from the
routing table, the aggregate will no longer be advertised. For example, if the 10.2.2.0/24 subnet was
removed from the routing table by shutting down the FastEthernet0/0 interface on R4, then the
aggregate would also be withdrawn and would not be advertised. This is illustrated in the output
below:
R4#debug ip bgp
BGP debugging is on for address family: IPv4 Unicast
R4#config t
Enter configuration commands, one per line. End with CNTL/Z.
R4(config)#interface FastEthernet0/0
R4(config-if)#shutdown
R4(config-if)#
*Mar 12 20:45:16.455: BGP(0): Aggregate processing for IPv4 Unicast
*Mar 12 20:45:16.455: BGP(0): For aggregate 10.0.0.0/8
*Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has an entry 10.0.0.0/8
*Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has another entry 10.4.4.0/24
*Mar 12 20:45:16.455: BGP(0): sub-prefix : 10.4.4.0/24Needs to be re-aggregated
*Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has an entry 10.0.0.0/8
*Mar 12 20:45:16.455: BGP(0): 10.0.0.0/8 subtree has another entry 10.4.4.0/24
*Mar 12 20:45:16.459: BGP(0): 10.0.0.0/8 aggregate is removed
*Mar 12 20:45:16.459: BGP(0): Aggregate 10.0.0.0/8 does not have more-specifics
As previously stated, by default when summarization is configured, BGP will advertise both the
summary and the more specific prefixes. Continuing with the previous aggregation configuration
example, the 10.0.0.0/8 aggregate as well as the 10.2.2.0/24 subnet would be advertised by R4 to
peer router R3. Again, this can be validated using the show ip bgp neighbors [address]
advertised-routes command as follows:
R4#show ip bgp neighbors 3.3.3.3 advertised-routes
BGP table version is 15, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
296
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
Network
*> 10.0.0.0
*> 10.4.4.0/24
Next Hop
0.0.0.0
0.0.0.0
Metric LocPrf Weight Path
32768 i
0
32768 i
Total number of prefixes 2
In order to suppress specific prefixes from being advertised, you must configure BGP manually to
do so by appending the summary-only keyword to the aggregate-address command as illustrated in the following output:
R4(config)#router bgp 4
R4(config-router)#aggregate-address 10.0.0.0 255.0.0.0 summary-only
R4(config-router)#end
Following this configuration, any specific prefixes encompassed by the aggregate are preceded by
an ‘s’, indicating that they are being explicitly suppressed as illustrated in the following output:
R4#show ip bgp
BGP table version is 16, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
*> 10.0.0.0
s> 10.4.4.0/24
Next Hop
0.0.0.0
0.0.0.0
Metric LocPrf Weight Path
32768 i
0
32768 i
Given this configuration, only the 10.0.0.0/8 aggregate is now advertised to R4 as follows:
R4#show ip bgp neighbors 3.3.3.3 advertised-routes
BGP table version is 16, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
*> 10.0.0.0
Next Hop
0.0.0.0
Metric LocPrf Weight Path
32768 i
Total number of prefixes 1
Route Adver sement Following BGP Policy Reconfigura on
Another common issue with BGP is route advertisement following policy reconfiguration or changes. BGP policies include filtering, using tools such as route maps, ACLs, and AS_PATH filters. As
was stated earlier in this chapter, the BGP Scanner process periodically scans the BGP Routing Information Base (RIB) in order to determine whether prefixes and attributes should be deleted and
whether route maps or filter caches should be flushed.
297
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Additionally, the BGP Scanner walks the BGP table and confirms reachability of the next-hops (i.e.,
it validates that next-hops are still valid). If the next-hop for a prefix is not reachable, all BGP entries that use that next-hop are removed from the BGP RIB. By default, the BGP Scanner runs every 60 seconds, which means that it could take up to a minute, or more depending on the number
of prefixes and other factors, such as resource utilization, for BGP policy changes to be detected
and, ultimately, for networks to be advertised (or withdrawn).
If you have made changes to BGP policy configuration (e.g., route maps) and you notice that the
network or networks are not being advertised by BGP (assuming the configuration is correct),
keep in mind that it simply may be that the BGP Scanner process has yet to incorporate the changes. Therefore, instead of waiting for the BGP Scanner process to run, use the clear ip bgp command to apply the configuration changes immediately. The complete syntax of this command is
as follows:
clear ip bgp [* | all | <autonomous-system-number> | <address> | peer-group
<name>] [in [prefix-filter] | out | slow | soft [in [prefix-filter] | out | slow]]
Table 7-1 below lists and describes the keywords that can be used with this command:
Table 7-1. Cisco IOS ‘clear ip bgp’ Command Keywords
Keyword
Function
*
The asterisk (*) resets all BGP peers (i.e., it tears down and resets all
BGP sessions). This should be used with extreme caution.
all
This optional keyword specifies the reset of all address family (AF)
sessions (e.g., the ipv4 [IPv4 AF] and the ipv6 [IPv6 AF]).
autonomousThis specifies the number of the autonomous system in which all BGP
system-number
peer sessions will be reset.
address
This specifies that only the identified BGP neighbor will be reset.
The value for this argument can be either an IPv4 address or an IPv6
address.
peer-group <name> This specifies that only the identified BGP peer group will be reset.
in
This optional keyword initiates inbound reconfiguration. If neither the
in nor out keywords are specified, both inbound and outbound sessions
are reset.
prefix-filter
This optional keyword clears the existing outbound route filter (ORF)
prefix list to trigger a new route refresh or soft reconfiguration, which
updates the ORF prefix list.
out
This optional keyword initiates outbound reconfiguration. If neither the
in nor out keywords are specified, both inbound and outbound sessions
are reset.
298
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
slow
soft
This optional keyword clears slow-peer status forcefully and moves it to
the original update group.
This optional keyword initiates a soft reset. In other words, using this
keyword does not tear down the BGP session.
TROUBLESHOOTING ROUTE REDISTRIBUTION ISSUES
For the most part, the redistribution of routes into BGP is a straightforward process. As is the case
with other routing protocols, this is performed using the redistribute [protocol] command.
By default, BGP will simply use the IGP metric when external routing information is injected into
BGP. This negates the need to specify a seed metric or assign a different route metric during redistribution. There are, however, some caveats that you should be familiar with when troubleshooting
route advertisement following redistribution.
One common issue is the advertisement of OSPF routes. By default, when OSPF is redistributed
into BGP, only internal OSPF and external Type 1 routes will be redistributed and advertised. This
is often a point of confusion because, by default, external OSPF routes in Cisco IOS software are
external Type 2. You can verify which OSPF route types are being redistributed into BGP either by
looking at the device configuration or by using the show ip route command, a sample output of
which is provided below:
R4#show ip route 192.168.1.0 255.255.255.0
Routing entry for 192.168.1.0/24
Known via “ospf 4”, distance 110, metric 20, type extern 2, forward metric 1
Redistributing via bgp 4
Last update from 10.0.0.1 on FastEthernet0/0, 00:10:23 ago
Routing Descriptor Blocks:
* 10.0.0.1, from 1.1.1.1, 00:10:23 ago, via FastEthernet0/0
Route metric is 20, traffic share count is 1
NOTE: Although the output above shows that the external Type 2 is being redistributed via
BGP, the route is not imported into the BGP RIB because only external Type 1s will be redistributed by default. This can be validated using the show ip bgp command as follows:
R4#show ip bgp 192.168.1.0 255.255.255.0
% Network not in table
When redistributing OSPF routes into BGP, you must specify the OSPF route types that will be
redistributed into BGP using the redistribute ospf [process ID] match <external 1|2>
<internal> <nssa-external 1|2> router configuration command. Following this, you can again
validate the implementation by looking at the router configuration or using the show ip proto-
299
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
cols command. For example, if BGP were configured to redistribute all internal and Type 2 exter-
nals (Type 5 and Type 7), the show ip route command would display the following:
R4#show ip route 192.168.1.0 255.255.255.0
Routing entry for 192.168.1.0/24
Known via “ospf 4”, distance 110, metric 20, type extern 2, forward metric 1
Redistributing via bgp 4
Advertised by bgp 4 match internal external 2 nssa-external 2
Last update from 10.0.0.1 on FastEthernet0/0, 00:13:14 ago
Routing Descriptor Blocks:
* 10.0.0.1, from 1.1.1.1, 00:13:14 ago, via FastEthernet0/0
Route metric is 20, traffic share count is 1
Because the route type is included in the redistribution configuration, the route is installed into the
BGP RIB and, assuming no filtering configuration, will be advertised to neighbors:
R4#show ip bgp 192.168.1.0 255.255.255.0
BGP routing table entry for 192.168.1.0/24, version 63
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
1
Local
10.0.0.1 from 0.0.0.0 (4.4.4.4)
Origin incomplete, metric 20, localpref 100, weight 32768, valid, sourced,
best
NOTE: In the output above, notice that the BGP next-hop is set to 10.0.0.1, which is the OSPF
next-hop address. In addition, the BGP metric for this prefix is the same as the OSPF metric.
Another common redistribution problem pertains to the redistribution of BGP routes into an IGP,
such as EIGRP or OSPF. By default, only external BGP routes are redistributed. This default behavior is used to avoid routing loops within the interior network. When redistributing BGP routes into
an IGP, you must use the bgp redistribute-internal command to redistribute iBGP prefixes
into the IGP.
NOTE: The redistribution of BGP into IGPs is not recommended. However, if it must be performed, be sure to use route filters to allow only the explicit prefixes that should be redistributed into the IGP to be imported. Do not blindly redistribute BGP into any IGP.
300
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
DEBUGGING BGP ROUTING ISSUES
Given that BGP is a resource-intensive protocol in itself, careful consideration should be given before enabling any kind of BGP debugging in production environments. BGP debugging is enabled
using the debug ip bgp command. The keywords that can be used in conjunction with this command are illustrated below:
R3#debug ip bgp ?
A.B.C.D
BGP neighbor address
all
All address families
dampening
BGP dampening
events
BGP events
groups
BGP Config (peer-groups, templates) and Update groups
import
BGP import routes to a vrf across address-family
in
BGP Inbound information
ipv4
Address family
ipv6
Address family
keepalives BGP keepalives
mpls
BGP MPLS label distribution
nsap
Address family
out
BGP Outbound information
rib-filter Next hop route watch filter events
updates
BGP updates
vpnv4
Address family
<cr>
NOTE: The majority of these options are beyond the scope of the TSHOOT certification
exam. Only those options that are relevant to this course are described below.
The debug ip bgp [address] updates command will print detailed information about updates
received from the specified BGP neighbor. If you wanted to see real-time information about updates from all BGP peers, you would use the debug ip bgp updates command instead. Following
is a sample output of the debug ip bgp [address] updates command, which illustrates detailed
information about updates received from the specified neighbor. This information includes NLRI,
NEXT_HOP information, MED information, and AS_PATH information, among other things, as
can be seen below:
R3#debug ip bgp 4.4.4.4 updates
BGP updates debugging is on for neighbor 4.4.4.4 for address family: IPv4
Unicast
R3#
R3#
R3#
*Mar 13 12:53:01.854: BGP(0): 4.4.4.4 send UPDATE (format) 150.4.0.0/24, next
3.3.3.3, metric 0, path 4 8 9 1
*Mar 13 12:53:01.854: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0)
150.5.0.0/24, next 3.3.3.3, metric 0, path 4 8 9 1
301
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
*Mar 13 12:53:01.854: BGP(0): 4.4.4.4 send UPDATE (format) 150.2.0.0/24, next
3.3.3.3, metric 0, path 4 5 6 7
*Mar 13 12:53:01.858: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0)
150.3.0.0/24, next 3.3.3.3, metric 0, path 4 5 6 7
*Mar 13 12:53:01.858: BGP(0): 4.4.4.4 send UPDATE (format) 160.0.0.0/24, next
3.3.3.3, metric 0, path 4 6 7 8
*Mar 13 12:53:01.858: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0)
150.9.0.0/24, next 3.3.3.3, metric 0, path 4 6 7 8
*Mar 13 12:53:01.878: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 4.4.4.4,
origin ?, metric 0, path 4 6 7 8
*Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcvd 150.9.0.0/24...duplicate ignored
*Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcvd 160.0.0.0/24...duplicate ignored
*Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcv UPDATE w/ attr: nexthop 4.4.4.4,
origin ?, metric 0, originator 0.0.0.0, path 4 2 3 5, community , extended
community
*Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcv UPDATE about 150.7.0.0/24 -- DENIED
due to: AS-PATH contains our own AS;
*Mar 13 12:53:01.882: BGP(0): 4.4.4.4 rcv UPDATE about 150.6.0.0/24 -- DENIED
due to: AS-PATH contains our own AS;
*Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 4.4.4.4,
origin ?, metric 0, path 4 8 9 1
*Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd 150.5.0.0/24...duplicate ignored
*Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd 150.4.0.0/24...duplicate ignored
*Mar 13 12:53:01.886: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 4.4.4.4,
origin ?, metric 0, path 4 5 6 7
*Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcvd 150.3.0.0/24...duplicate ignored
*Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcvd 150.2.0.0/24...duplicate ignored
*Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcv UPDATE w/ attr: nexthop 4.4.4.4,
origin ?, metric 0, originator 0.0.0.0, path 4 1 2 3, community , extended
community
*Mar 13 12:53:01.890: BGP(0): 4.4.4.4 rcv UPDATE about 150.1.0.0/24 -- DENIED
due to: AS-PATH contains our own AS;
*Mar 13 12:53:01.894: BGP(0): 4.4.4.4 rcv UPDATE about 150.0.0.0/24 -- DENIED
due to: AS-PATH contains our own AS;
*Mar 13 12:53:01.894: BGP(0): updgrp 1 - 4.4.4.4 updates replicated for
neighbors:
The debug ip bgp events command will provide real-time information on internal BGP events,
such as the BGP Scanner walking the RIB. This command will also provide information about soft
and hard peer session resets, for example. Following is a sample of the information that is provided
by this command:
R3#debug ip bgp events
BGP events debugging is on
R3#
*Mar 13 13:25:50.218: BGP:
*Mar 13 13:25:50.218: BGP:
*Mar 13 13:25:56.846: BGP:
1/1
*Mar 13 13:26:05.218: BGP:
Regular scanner event timer
Import timer expired. Walking from 1 to 1
4.4.4.4 start outbound soft reconfig for afi/safi:
Regular scanner event timer
302
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
*Mar 13 13:26:05.218:
*Mar 13 13:26:20.218:
*Mar 13 13:26:20.218:
*Mar 13 13:26:26.978:
afi: 0
*Mar 13 13:26:35.218:
*Mar 13 13:26:35.218:
*Mar 13 13:26:35.218:
*Mar 13 13:26:35.218:
general scan
*Mar 13 13:26:35.218:
version: 1294
*Mar 13 13:26:35.218:
*Mar 13 13:26:35.218:
general scan
*Mar 13 13:26:35.218:
version: 1295
*Mar 13 13:26:35.218:
*Mar 13 13:26:35.218:
general scan
BGP:
BGP:
BGP:
BGP:
Import timer expired.
Regular scanner event
Import timer expired.
4.4.4.4 refresh timer
Walking from 1 to 1
timer
Walking from 1 to 1
expired, no pending refresh for
BGP: Regular scanner event timer
BGP: Performing BGP general scanning
BGP(0): scanning IPv4 Unicast routing tables
BGP(IPv4 Unicast): Performing BGP Nexthop scanning for
BGP(0): Future scanner version: 1295, current scanner
BGP(1): scanning IPv6 Unicast routing tables
BGP(IPv6 Unicast): Performing BGP Next hop scanning for
BGP(1): Future scanner version: 1296, current scanner
BGP(2): scanning VPNv4 Unicast routing tables
BGP(VPNv4 Unicast): Performing BGP Next hop scanning for
Finally, as was stated at the beginning of this section, the debug ip bgp updates command
provides the same information as the debug ip bgp [address] updates command but for all
configured BGP peers instead of specific ones. For granularity, the command supports additional
keywords that can be used to filter the output. These keywords are illustrated below:
R3#debug ip bgp updates ?
<1-199>
Access list
<1300-2699> Access list (expanded range)
events
Update events
in
Inbound updates
out
Outbound updates
<cr>
ACLs can be used in conjunction with this command to restrict the output to the prefixes that
are included in the ACLs. For example, to restrict the debug output to activities pertaining to the
150.2.0.0 and 150.3.0.0 prefixes, you would perform the following sequence of steps on the router
on which you wanted to see the debugging output:
R3#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R3(config)#access-list 1 permit host 150.2.0.0
R3(config)#access-list 1 permit host 150.3.0.0
R3(config)#exit
R3#
R3#debug ip bgp updates 1
BGP updates debugging is on for access list 1 for address family: IPv4 Unicast
303
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R3#
R3#
R3#
*Mar 13 13:37:33.154: BGP(0): 4.4.4.4 send UPDATE (format) 150.2.0.0/24, next
3.3.3.3, metric 0, path 4 5 6 7
*Mar 13 13:37:33.158: BGP(0): 4.4.4.4 send UPDATE (prepend, chgflags: 0x0)
150.3.0.0/24, next 3.3.3.3, metric 0, path 4 5 6 7
*Mar 13 13:37:33.182: BGP(0): 4.4.4.4 rcvd UPDATE w/ attr: next hop 4.4.4.4,
origin ?, metric 0, path 4 5 6 7
*Mar 13 13:37:33.182: BGP(0): 4.4.4.4 rcvd 150.3.0.0/24...duplicate ignored
*Mar 13 13:37:33.186: BGP(0): 4.4.4.4 rcvd 150.2.0.0/24...duplicate ignored
*Mar 13 13:37:33.186: BGP(0): updgrp 1 - 4.4.4.4 updates replicated for
neighbors:
R3#
R3#
R3#show ip access-lists 1
Standard IP access list 1
10 permit 150.2.0.0 (6 matches)
20 permit 150.3.0.0 (6 matches)
NOTE: The same could also be performed using an extended ACL. Such an ACL, for example,
might be configured as follows:
access-list 100 permit ip host 150.2.0.0 host 255.255.255.0
access-list 100 permit ip host 150.3.0.0 host 255.255.255.0
Following this, you would then use the debug ip bgp updates 100 command to restrict or filter
the debug output to events pertaining to these two networks only.
The events keyword provides real-time information on update events, which includes received
updates or updates sent by the local BGP speaker. The following is a sample of the information that
is printed by this command:
R3#debug ip bgp updates events
BGP updates debugging is on events for address family: IPv4 Unicast
R3#
*Mar 13 13:44:21.154: BGP(0): Begin update run for versions 1->148 for 1 update
groups for attrs 0->2
*Mar 13 13:44:21.154: BGP(0): pre format update-group 1 leader is 4.4.4.4 with
0/1000 msgs, versions 0/148
*Mar 13 13:44:21.154: BGP(0): post format update-group 1 leader is 4.4.4.4 with
3/1000 msgs, versions 148/148, formatting was not aborted
*Mar 13 13:44:21.154: BGP(0): End update run for versions 1->148 (0ms), TX has
not completed, 3 updates formatted, formatting was not aborted, 3 attrs - 6 nets
visited
*Mar 13 13:45:30.142: BGP(0): Begin update run for versions 1->148 for 1 update
groups for attrs 0->2
304
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
*Mar 13 13:45:30.146: BGP(0): pre format update-group 1 leader is 10.0.0.2 with
0/1000 msgs, versions 0/148
*Mar 13 13:45:30.146: BGP(0): post format update-group 1 leader is 10.0.0.2 with
3/1000 msgs, versions 148/148, formatting was not aborted
*Mar 13 13:45:30.146: BGP(0): End update run for versions 1->148 (4ms), TX has
not completed, 3 updates formatted, formatting was not aborted, 3 attrs - 6 nets
visited
Finally, the in and out keywords can be used to restrict or filter the debug output to inbound or
outbound updates. The default is to print information on both.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Border Gateway Protocol Overview
•
Border Gateway Protocol is a Path Vector protocol
•
BGP is primarily used to exchange NLRI between routing domains or autonomous systems
•
The following four BGP processes run when BGP is enabled in Cisco IOS-based devices:
1. The BGP Open process
2. The BGP I/O process
3. The BGP Scanner process
4. The BGP Router process
•
The BGP Open process is used for peer establishment
•
The BGP I/O process handles the reading, writing, and execution of BGP messages
•
The BGP Scanner process periodically scans the BGP RIB
•
The BGP Router process sends/receives routes, establishes peers, and interacts with the RIB
•
The three major components of the BGP Router process are as follows:
1. The BGP Routing Information Base (RIB)
2. The IP RIB for BGP-learned Prefixes
3. The IP Switching Component for BGP-learned Prefixes
•
In Cisco IOS software, BGP information is stored in one of two data structures or tables
•
These two data structures or tables are the Neighbor Table and the BGP Table
•
The Neighbor Table contains a list of all the configured neighbors of the local BGP speaker
•
The BGP Table or BGP Routing Information Base (RIB) contains all BGP routes
•
All Border Gateway Protocol messages all share a common header which is 19-bytes long
•
Only four BGP messages are available, which are as follows:
1. The OPEN Message
305
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
2. The UPDATE Message
3. The NOTIFICATION Message
4. The KEEPALIVE Message
•
The different states BGP will go through before a neighbor relationship is established are as follows:
1. The Idle State
2. The Connect State
3. The Active State
4. The OpenSent State
5. The OpenConfirm State
6. The Established State
•
Border Gateway Protocol path attributes fall into the following four separate categories:
1. Well-known mandatory
2. Well-known discretionary
3. Optional transitive
4. Optional non-transitive
•
The two BGP attributes that are used to influence the inbound path are as follows:
1. The MULTI_EXIT_DISC attribute
2. The AS_PATH attribute
•
The two BGP attributes that are used to influence the outbound path are as follows:
1. The LOCAL_PREF Attribute
2. The WEIGHT Attribute
Troubleshoo ng Neighbor Rela onships
•
The most common reason for failing to establish peer relationships are misconfigurations
•
Common misconfigurations include the following:
1. Using the Incorrect Autonomous System Number for the Peer
2. Using the Incorrect IP Address for the Peer
3. Incorrect Authentication Parameters
4. No IP Connectivity between Indirectly Connected Peers
•
In addition to misconfigurations, the following can prevent adjacencies from establishing:
1. Access Control Lists Filtering BGP Packets
2. Layer 1 and Layer 2 Issues
3. Device Resource Consumption
306
C H A P T E R 7: T RO U B L ES H O OT I N G B G P
Troubleshoo ng Route Adver sement
•
Route advertisement issues with BGP are commonly due to device misconfigurations
•
The network command must match the exact route BGP is to advertise
•
The aggregate-address command requires a specific route for aggregate advertisement
•
By default, the aggregate-address command does not suppress specific prefix
•
The BGP Scanner process checks the BGP Table for updates every 60 seconds
•
Following policy modification, clear the BGP session for changes to apply immediately
Troubleshoo ng Route Redistribu on Issues
•
By default, BGP will use the IGP metric when routes are redistributed into BGP
•
When redistributing OSPF into BGP, only internal and Type 1 externals are redistributed
•
By default, only external BGP routes are redistributed into IGPs
•
The bgp redistribute-internal command is required to redistributed iBGP routes
307
CHAPTER 8
Troubleshoo ng Cisco IOS
Security Features
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
C
isco IOS Catalyst switches and routers support several security features that are designed to
protect not only the switches and routers themselves but also users connected to those switch-
es. In addition to demonstrating a solid understanding of these features, it is sometimes also necessary to troubleshoot security-feature-related network problems. The TSHOOT certification exam
objectives that are covered in this chapter are as follows:
•
Troubleshoot private VLANS
•
Troubleshoot port security
•
Troubleshoot general switch security
•
Troubleshoot VACL and PACL
•
Troubleshoot configuration issues related to accessing the AAA server for authentication
•
Troubleshoot Layer 3 security
•
Troubleshoot issues related to ACLs used to secure access to Cisco routers
•
Troubleshoot security issues related to IOS services (i.e., finger, NTP, HTTP, FTP, RCP)
From a conceptual perspective, switches and routers have a communications architecture that is
comprised of three different planes, all of which are vulnerable to security attacks. It is therefore
important to understand the functionality of each of these planes, as well as the tools that are available in Cisco IOS software that can be used to secure each individually. In addition, it is also important to understand common problems associated with these technologies and the ways in which to
troubleshoot and resolve them. This chapter is divided into the following sections:
•
Cisco IOS Security Fundamentals
•
Management Plane Security and Troubleshooting
•
Control Plane Security and Troubleshooting
•
Forwarding Plane Security and Troubleshooting
•
Cisco IOS Firewall Fundamentals
NOTE: While this is not a security exam, as a network engineer you are expected to have some
basic understanding of general security principles, configuration, and troubleshooting. While
this guide will not be going into detail on all Cisco IOS software security features, you can find
additional information in the current CCNA Security study guide that is available online.
310
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
CISCO IOS SECURITY FUNDAMENTALS
As was stated in the introduction, the communications architecture of all switches and routers is
segmented into three different planes, which are vulnerable to security attacks. Understanding not
only how to secure these planes but also how to troubleshoot and resolve potential problems based
on implemented solutions is a core requirement of any network engineer. The communications
architecture planes of network devices are as follows:
•
The management plane
•
The control plane
•
The forwarding plane
The Management Plane
The management plane is responsible for management functions. The management plane is used
to manage a device through its connection to the network. This plane also coordinates functions
among all other planes (i.e., the control and the forwarding planes). Management protocols such
as SNMP, Telnet, HTTP, HTTPS, and SSH are used for device monitoring and CLI access at the
management plane. In addition to management protocols, console access (i.e., via the console port)
is also used to manage devices. Some security considerations for the management plane include
the following:
•
Secure access to the device console: Use logins and passwords to ensure that the console is
secured and that no unauthorized parties are able to gain access to the device. Consider using
security authentication protocols such as RADIUS and TACACS+ to centralize the console
authentication process. In addition to passwords and security protocols, physical security
should also be taken into consideration, ensuring that only authorized personnel can gain
physical access to the device(s).
•
Avoid using management protocols, such as Telnet, that send the username/password information in clear text. Instead, consider using SSH to access and manage network devices
remotely. In addition, consider using IP access control lists (ACLs) to restrict the range of
addresses or networks that can gain remote access to the device(s).
•
Consider implementing only the management protocols that are required. For example, if
HTTP will not be used, disable this service and enable HTTPS access only. When implementing monitoring (e.g., via SNMP), consider using SNMPv1 and v2c for read-only access
to devices, while using SNMPv3, which offers greater security than versions 1 and 2c, for
read-write access to the device(s).
311
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Disable the password-recovery service using the no service password-recovery global
configuration command. This prevents anyone with console access from insecurely accessing the device configuration and clearing the password. It also prevents malicious users
from changing the configuration register value and accessing NVRAM.
•
Disable any unused services that can be used to launch Denial of Service (DoS) attacks. These
services include Transmission Control Protocol (TCP) and User Datagram Protocol (UDP)
small services, which include Echo (port number 7), Discard (port number 9), Daytime (port
number 13), and Chargen (port number 19). By default, these services are disabled in Cisco
IOS 12.0 and later. Another service that should be disabled is the finger service. This service
provides information on who is logged into the system and provides extensive user information, which is extremely valuable for hacking. By default, finger is disabled in Cisco IOS 12.1
and later. Additional services that should be disabled if not used include HTTP and HTTPS,
CDP, and the configuration service that allows a Cisco IOS device to attempt to locate a configuration file on the network using TFTP. This is disabled using the no service config
global configuration command.
•
Also consider reducing the EXEC timeout, which specifies the interval that the EXEC command interpreter waits for user input before it terminates a session. By default, sessions
are disconnected after 10 minutes of inactivity; however, this can be modified using the
exec-timeout line configuration command.
•
Finally, enable logging, preferably to a central location. Logging provides you visibility into
the operation of a device and the network into which it is deployed.
The Control Plane
A control plane is a collection of processes that run at the process level on a route processor and
collectively provide high-level control for most Cisco IOS software functions. All traffic directly or
indirectly destined to a router or switch is handled by the control plane. Control plane protocols
include routing protocols, such as EIGRP and OSPF, as well as Layer 2 protocols, such as Spanning
Tree Protocol (STP).
Essentially, most control plane protocols, (e.g., EIGRP, OSPF, and HSRP) have their own in-built
security and authentication schemes. For example, all three protocols mentioned support MD5
hashing as a means to protect protocol messages. For protocols such as STP, consider integrating
Cisco IOS enhancements, such as Root Guard and BPDU Guard.
312
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
The BPDU Guard feature is used to protect the Spanning Tree domain from external influence
by preventing false information from being injected into the Spanning Tree domain on ports that
have Spanning Tree disabled. BPDU Guard is disabled by default but is recommended for all ports
on which the Port Fast feature has been enabled. When a port that is configured with the BPDU
Guard feature receives a BPDU, it immediately transitions to the errdisable state.
On the other hand, the Root Guard feature prevents a designated port from becoming a root port.
If a port on which the Root Guard feature receives a superior BPDU, it moves the port into a rootinconsistent state, thus maintaining the current root bridge status quo. Both BPDU Guard and
Root Guard are described in greater detail in the SWITCH study guide available online.
At the control plane, additional features such as Dynamic ARP Inspection and DHCP Snooping can
be used to protect against vulnerabilities in protocols such as Address Resolution Protocol (ARP)
and Dynamic Host Configuration Protocol (DHCP), respectively. Dynamic ARP Inspection (DAI)
is used to protect against ARP spoofing attacks, while DHCP Snooping is used to protect against
DHCP spoofing and starvation attacks. DHCP starvation attacks work by using MAC address spoofing and entail flooding a large number of DHCP requests with randomly generated spoofed MAC
addresses to the target DHCP server, thereby exhausting the address space available for a period of
time. This prevents legitimate DHCP clients from being serviced by the DHCP server.
Cisco IOS software also supports Control Plane Policing (CoPP) and Control Plane Protocol (CPP),
which allow administrators to secure the control plane further. Control Plane Policing allows administrator to configure a Quality of Service (QoS) filter that manages the traffic flow of control
plane packets to protect the control plane of Cisco IOS routers and switches against reconnaissance and Denial of Service (DoS) attacks. Implementing this feature allows the control plane
to maintain packet forwarding and protocol states despite an attack or heavy traffic load on the
router or the switch.
Control Plane Protection (CPP) extends on CoPP by providing additional granularity. CPP allows
for the classification of the control plane traffic based on packet destination and information provided by the forwarding plane, allowing appropriate throttling for each category of packet. Unlike
CoPP, CPP is dependent on CEF for IP packet redirection. The configuration of CoPP and CPP is
beyond the scope of the current TSHOOT certification exam. Some additional security considerations for the control plane include the following:
•
Disable Internet Control Message Protocol (ICMP) redirects using the no ip redirects
interface configuration command. There are two types of ICMP redirect messages: redirect for a host address and redirect for an entire subnet. A malicious user can exploit the
ability of the router to send ICMP redirects by continually sending packets to the router,
313
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
forcing the router to respond with ICMP redirect messages, resulting in an adverse impact
on the CPU and performance of the router.
•
Unless absolutely required, disable ICMP unreachables. ICMP destination unreachable messages are generated by a router to inform the source host that the destination Unicast address
is unreachable. While typically a good thing, generating many of these messages can increase
the CPU utilization on a device and facilitate DoS attacks. ICMP unreachables are disabled using the no ip unreachables interface configuration command in Cisco IOS software.
•
Another control plane service that should be disabled unless absolutely needed is Proxy
ARP. Proxy ARP allows the router to answer ARP requests intended for another machine.
In most networks, this is a good thing because it negates the need for hosts to have a default gateway or routing intelligence. However, from a security perspective, Proxy ARP
can allow attackers to spoof or pretend to be another machine, facilitating Man-in-theMiddle (MITM) attacks. Additionally, using proxy ARP can result in an increase in the
amount of ARP traffic on the network segment and resource exhaustion. This feature can
be disabled using the no ip proxy-arp interface command.
•
Secure routing protocols and First Hop Redundancy Protocols (FHRPs) using Message Digest 5 (MD5) authentication in the domain. This prevents the injection of false routing
information into the domain. Also consider additional protocol functions, such as limiting
the size of the Link State Database (LSDB) when using OSPF, for example, to protect the
routing protocol implemented in the network.
The Forwarding Plane
The forwarding or data plane is responsible for the actual forwarding of data. The data plane is
typically populated using information derived from the control plane. This plane is used to determine the physical next-hop egress interface for received packets or frames and then forwards the
packets or frames using the correct egress interface. The forwarding or data plane can be secured
by implementing ACLs, which can take the form of Routed ACLs (RACLs), Port ACLs (PACLs), or
VLAN ACLs (VACLs) in Cisco IOS Catalyst switches. Additional security considerations for the
forwarding or data plane include the following:
•
Depending on your network, consider dropping packets that have IP options if there is no
legitimate reason for such packets on the network. As stated earlier in this guide, packets
with IP options are punted to the CPU and are processed in software. A large number of these
packets can greatly increases CPU utilization on the device. The device can be configured to
drop such packets using the ip options drop global configuration command.
314
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
•
Disable source routing, which allows the source of the IP packet to specify the network path
a packet takes. This functionality can be used in attempts to route traffic around security
controls in the network. IP source routing is enabled by default; however, this can be disabled
using the no ip source-route global configuration command.
•
Disable IP-directed Broadcasts, which makes it possible to send an IP Broadcast packet to
a remote IP subnet. This functionality has been used to facilitate smurf attacks. A smurf
attack is also commonly referred to as a ping flood or ICMP flood. This attack sends large
amounts of ICMP packets to a machine in order to attempt to crash the TCP/IP stack
on the machine and cause it to stop responding to TCP/IP requests. By default, directed
Broadcasts are disabled, but if enabled use the no ip directed-broadcast interface configuration command to disable this feature.
•
Implement anti-spoofing techniques such as Unicast Reverse Path Forwarding (uRPF) and IP
Source Guard. Unicast RPF enables a device to verify that the source address of a forwarded packet can be reached through the interface that received the packet. This feature is
enabled using the ip verify unicast source reachable-via interface configuration
command. IP Source Guard, commonly used with DHCP Snooping, restricts IP traffic on
untrusted Layer 2 ports by filtering the traffic based on the DHCP snooping binding database
or manually configured IP source bindings. The IP Source Guard feature is enabled by issuing
the ip verify source interface configuration command on Layer 2 interfaces.
MANAGEMENT PLANE SECURITY AND TROUBLESHOOTING
When implementing Cisco IOS security solutions, you should secure both the management plane
and the control plane of a device because operations of the control plane directly affect operations
of the management plane. The following protocols are used at the management plane:
•
•
•
•
•
•
•
•
•
•
•
Simple Network Management Protocol
Telnet
Secure Sockets Shell Protocol
File Transfer Protocol
Trivial File Transfer Protocol
Secure Copy Protocol
RADIUS
TACACS+
NetFlow
Network Time Protocol
Syslog
315
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
When implementing security, it is important to understand that once the management plane is
breached using any of the management protocols described above, the control and data planes are
also compromised. While delving into detail on all possible management plane protocol troubleshooting scenarios is neither feasible nor within the scope of the TSHOOT certification exam, the
following section describes some common problems and ways to resolve them.
Telnet and Secure Sockets Shell Protocol (SSH) are two commonly used management protocols.
SSH provides a more secure and reliable method for device access and administration than Telnet.
SSH secures the sessions using standard cryptographic mechanisms. SSH uses TCP and UDP port
22, although TCP port 22 is the de-facto port listed for SSH. Unlike Telnet, SSH ensures that data
is encrypted and is therefore untraceable by network sniffers, for example.
There are two versions of SSH available: SSH version 1 and SSH version 2. While SSHv1 is an improvement over Telnet, which sends the username and password information in clear text, some
fundamental design flaws exist in SSHv1. For example, there are several tools readily available on
the Internet that can decrypt SSHv1 traffic on the fly, thus removing most security from encrypting
the traffic with SSHv1. Therefore, when implementing SSH, it is highly recommended that SSHv2
be used. A commonly encountered error message when attempting to access a device remotely via
SSH or Telnet is the ‘Password required, but none set’ connection error.
This error indicates that the VTY lines have not been configured with the password <secret>
and login configuration commands. While it is common practice simply to use the line vty 0
4 command to configure the Virtual Teletype Terminal (VTY) password, keep in mind that certain
devices support more than five VTY lines. Use the show line command to verify the number of
lines that are supported by the device and then ensure that the password is configured for the correct range of supported lines. The following shows the output of this command on a switch that
supports 16 VTY lines:
Switch>show line
Tty Typ
Tx/Rx
* 0 CTY
1 VTY
2 VTY
3 VTY
4 VTY
5 VTY
6 VTY
7 VTY
8 VTY
9 VTY
10 VTY
11 VTY
A Modem
-
Roty AccO AccI
-
316
Uses
0
0
0
0
0
0
0
0
0
0
0
0
Noise
0
0
0
0
0
0
0
0
0
0
0
0
Overruns
0/0
0/0
0/0
0/0
0/0
0/0
0/0
0/0
0/0
0/0
0/0
0/0
Int
-
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
12
13
14
15
16
VTY
VTY
VTY
VTY
VTY
-
-
-
-
-
0
0
0
0
0
0
0
0
0
0
0/0
0/0
0/0
0/0
0/0
-
If you were to use the line vty 0 4 command to configure login and password security for this
switch, remote access would be available only via the first five lines. Both Telnet and SSH are allowed by default in Current Cisco IOS software versions as can be validated in the output of the
show line command in the following output:
Router#show line vty 0
Tty Typ
Tx/Rx
A Modem
66 VTY
-
Roty AccO AccI
-
Uses
2
Noise
0
Overruns
0/0
Line 66, Location: “”, Type: “”
Length: 24 lines, Width: 80 columns
Baud rate (TX/RX) is 9600/9600
Status: Ready, No Exit Banner
Capabilities: none
Modem state: Ready
Group codes:
0
Special Chars: Escape Hold Stop Start Disconnect Activation
^^x
none
none
Timeouts:
Idle EXEC
Idle Session
Modem Answer Session
00:10:00
never
none
Idle Session Disconnect Warning
never
Login-sequence User Response
00:00:30
Autoselect Initial Wait
not set
Modem type is unknown.
Session limit is not set.
Time since activation: never
Editing is enabled.
History is enabled, history size is 20.
DNS resolution in show commands is enabled
Full user help is disabled
Allowed input transports are lat pad telnet rlogin mop v120 ssh.
Allowed output transports are lat pad telnet rlogin mop v120 ssh.
Preferred transport is lat.
No output characters are padded
No special data dispatching characters
Int
-
Dispatch
not set
However, while SSH is allowed, unlike Telnet, additional configuration is required to allow a device
to be managed via SSH. This additional configuration includes the following:
317
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
1. Configure a domain name on the router. This is performed via the ip domain-name [name]
global configuration command.
2. Generate the security keys that will be used by SSH using the crypto key generate rsa
global configuration command and specifying the desired key size; or, alternatively, with the
crypto key generate rsa general-keys global configuration command and specifying
the desired key size. Both of these commands automatically enable SSH when executed, and
no further configuration is necessary. The key (modulus) size that is used for SSH can be up
to 2048 bits in length. The larger the key size, the more secure the implementation; however,
larger keys also take a longer time to generate. When generating a public key, Cisco recommends a minimum key size of 1024 bits.
3. Specify the time that the router waits on the SSH client, in seconds, to input username and
password information before disconnecting the session via the ip ssh timeout global configuration command. This is an optional step because, by default, the router will wait for 120
seconds (2 minutes).
4. Specify the number of SSH authentication retries before a session is reset via the ip ssh
authentication-retries global configuration command. By default, the Cisco IOS router
will allow up to three failed logins before resetting the SSH connection. As is the case with the
ip ssh timeout global configuration command, this is an optional step.
If these configuration steps are not implemented, you will not be able to manage the device remotely using SSH. Telnet access, however, requires no additional configuration other than specifying
and configuring relevant passwords (e.g., VTY line passwords).
Simple Network Management Protocol (SNMP), which was described earlier in this guide, is a
commonly used management protocol. SNMPv1 and v2c use a community-based form of authentication, while SNMPv3 uses a user and group security model. When troubleshooting SNMP access
issues, verify that the correct configuration has been implemented. This entails checking the configured community strings (SNMPv1 and SNMPv2c) or user and group configurations (SNMPv3)
using commands such as the show snmp community command as illustrated in the following
output:
R1#show snmp community
Community name: tsh00t!
Community Index: cisco5
Community SecurityName: tsh00t!
storage-type: nonvolatile
active access-list: 1
318
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
Community name: ccnp!
Community Index: cisco6
Community SecurityName: ccnp!
storage-type: nonvolatile
active access-list: 2
Additionally, if ACLs have been implemented to restrict SNMP access to the device, as is illustrated
above, verify that the IP address of the NMS is included in the valid network range or list of hosts
allowed to manage the device by using the show ip access-lists command.
File Transfer Protocol (FTP), Trivial File Transfer Protocol (TFTP), and Secure Copy Protocol
(SCP) are commonly used methods for copying router and switch configuration or image files.
While TFTP requires no additional configuration in Cisco IOS software, FTP and SCP both do.
When using FTP, a username and password pair is required for access. This is configured using the
ip ftp username <name> and ip ftp password <secret> global configuration commands.
In addition to configuring a username and password pair, it is also important to ensure that the
specified account has the correct privileges on the server.
SCP provides even greater security than the two previously discussed methods by using SSH for
secure transport. This means that SSH must be enabled on the device before SCP can be used.
In addition, Cisco IOS software also requires that authentication, authorization, and accounting
(AAA) be configured on the router. Given that understanding and troubleshooting AAA are core
exam requirements, this service is described in additional detail in the following section.
Understanding Authen ca on, Authoriza on, and Accoun ng
Authentication, authorization, and accounting, referred to as AAA and pronounced as ‘Triple-A,’
provides the framework that controls and monitors network access. Authentication is used to validate identity (i.e., who the user is); authorization is used to determine what that particular user can
do (i.e., the services available to the user); and, finally, accounting is used to allow for an audit trail
(i.e., what that user did during the period he/she was logged in.
AAA services can be used to control administrative device access, such as Telnet and console login,
which is referred to as character mode access. In addition, AAA can also be used to manage network access (e.g., via dial-up or Virtual Private Network (VPN) clients), which is referred to as
packet mode access. AAA relies on Attribute-Value (AV) pairs, which are simply secured network
objects. It is comprised of an attribute, such as the username or password, and a value for that particular attribute. Another example of an AV pair would be an attribute, such as a command, with a
value of ‘configure terminal.’
319
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The AAA services can be administered by using local username and password databases that are
stored on the network devices or a centralized security server, with the latter being the most common. On the local device, the username and password pairs are configured using the username
<name> secret <password> global configuration command. The configuration of the AAA server
is a little more complex, but is beyond the scope of the TSHOOT certification exam and will not be
illustrated in this chapter or in the remainder of this guide.
AAA allows devices to point to multiple security servers, which are often referred to as server
groups. User, device, and services information can be replicated between multiple servers, which
provide redundancy in large networks. In order for AAA to work, the Network Access Server
(NAS), which is any device – such as a router, a switch, or a firewall, must be able to access security
information for a specific user to provide AAA services.
To reinforce the concept of AV pairs, the following Figure 8-1 is used to illustrate their use in AAA
services when the security information is stored locally on the NAS:
Fig. 8-1. Understanding AAA Local Authentication Operation
Referencing Figure 8-1, in step 1, the remote user attempts to connect to R1 (NAS) via Telnet. Assuming that the NAS has been configured for AAA services, using its local database for authentication, the NAS presents the remote user with the username and password prompt, as illustrated in
step 2.
320
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
The remote user then enters his/her credentials, providing the username administrator, which
is the ATTRIBUTE, and password t5h00t!2010, which is the VALUE for that ATTRIBUTE, as illustrated in step 3. The NAS then checks the information against its local database. Assuming that
the NAS has been configured with the username administrator secret t5h00t!2010 global
configuration command, each AV is on file and the AV pair is found. The request is accepted and
a pass message is returned, as illustrated in step 4, which enables the connection from the remote
user to be made.
Taking this example a step further, this time depicting the use of an external AAA server, Figure 8-2
below illustrates the use of AV pairs for authorization, this time using an external server:
Fig. 8-2. Understanding AAA Remote Authorization Operation
In Figure 8-2, assume that the remote user has been successfully authenticated. Once logged into
R1 (NAS), the remote user attempts to issue the configure terminal command, as illustrated in
step 1. The NAS has been configured to use AAA services for authorization, and so the request is
sent to the AAA server, as illustrated in step 2. The AAA server checks its database for the relevant
AV pair.
321
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In step 3, the server finds that the attribute and value are on file, and the AV pair is found. The
request is therefore accepted and the configure terminal command is successfully authorized
on R1, as illustrated in step 4. The remote user successfully enters configuration mode. Again, the
same concept would be applicable if authorization were being performed using the local database.
NOTE: Of the three AAA services, only accounting does not require AV pairs. Instead, information is simply received with AV pairs and stored in the database.
When implementing AAA services, two main security protocols are used: RADIUS and TACACS+.
Both are described in the following sections.
RADIUS
Remote Authentication Dial-In User Service (RADIUS) is a client/server protocol that is used to
secure networks against intruders. RADIUS was created by Livingston Enterprises, but is now defined in RFC 2138 and RFC 2139. The RADUIS protocol authentication and accounting services
are documented separately in RFC 2865 and RFC 2866, respectively. These two RFCs replace RFC
2138 and RFC 2139.
A RADIUS server is a device that has the RADIUS daemon or application installed. RADIUS is an
open-standard protocol that is distributed in C source code format. This allows for interoperability
and flexibility between RADIUS-based products from different vendors. RADIUS uses UDP as the
Transport Layer protocol for communications between the client and the server, using UDP port
1812 for authentication and authorization, and UDP port 1813 for accounting. However, it should
be noted that earlier deployments of RADIUS use UDP port 1645 for authentication and authorization, and UDP port 1646 for accounting. Because RADIUS uses UDP as a transport protocol,
there is no offer of guaranteed delivery of RADIUS packets. Therefore, any issues related to server
availability, the retransmission of packets, and timeouts, for example, are handled by the RADIUSenabled devices.
The RADIUS accounting function is designed as a way to transmit data at the beginning and at the
end of a session. This data can indicate resource utilization, such as bandwidth and time used, and
may be used for billing and/or security purposes.
TACACS+
TACACS+ stands for Terminal Access Controller Access Control System Plus. Unlike RADIUS,
which is an open-standard protocol, TACACS+ is a Cisco-proprietary protocol that is used in the
AAA framework to provide centralized authentication of users who are attempting to gain access
to network resources.
322
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
There are several notable differences between TACACS+ and RADIUS. One of the most notable differences is that TACACS+ uses TCP as a Transport Layer protocol, using TCP port 49. In
addition, TACACS+ separates the three AAA architectures, unlike RADIUS, which groups authentication and authorization together and separates accounting. TACACS+ also encrypts the
data between the user and the server, unlike RADIUS, which encrypts only the password. Finally,
TACACS+ supports multiple protocols, such as IP, IPX, AppleTalk, and X.25, whereas RADIUS has
limited protocol support.
TACACS+ authentication is initiated when a user attempts an ASCII login by authenticating to a
server running the TACACS+ daemon. The TACACS+ authorization process is performed using
two distinct message types: REQUEST and RESPONSE. The authorization process is then performed using a session that consists of this pair of messages.
TACACS+ REQUEST messages are sent by clients and they contain information pertaining to
the authenticity of the user or service (authentication information), as well as a list of the services
or options for which authorization is being requested. When the TACACS+ server receives the
REQUEST message, it replies with a RESPONSE message. Finally, TACACS+ accounting occurs
by sending a record to the AAA server. Each send record includes an AV pair that is used for accounting.
NOTE: Going into detail on the types of messages is beyond the scope of the current TSHOOT
certification exam and will not be included in this chapter or in the remainder of this guide.
Implemen ng and Configuring AAA Services
AAA services can be implemented in one of three ways as follows:
1. AAA can be implemented as a self-contained local security database that contains the usernames and passwords required for authentication.
2. AAA can be implemented as a Cisco Access Control Server (ACS) application server (this
can be an external server). Cisco ACS can be installed onto both Windows and Unix-based
platforms. This implementation is suitable for medium to large networks. Cisco ACS configuration is beyond the scope of the current TSHOOT certification exam.
3. Finally, AAA can also be implemented using the Cisco Secure ACS Solutions Engine appliance, which is a dedicated external platform offered by Cisco that scales very well and is suitable for very large networks. As with ACS configuration, Cisco Secure ACS Solutions Engine
appliance is beyond the scope of the current TSHOOT certification exam.
323
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
AAA services are based on method lists. Method lists contain sequenced AAA entries and are
configured to define which of the three AAA services will be performed and the sequence in which
they will be performed. The method argument refers to the actual method the authentication algorithm tries. Therefore, when a user attempts to authenticate, the NAS contacts each of the entries
in sequence to validate the user.
Method lists allow control of one or more security protocols and security servers to be used to offer
fault tolerance and backup of authentication databases. The AAA engine will use the first method
listed in the method list, and if that is unavailable, it will fall back to the next method on the list.
However, it is important that this works only if the message received from the first method listed is
not a FAIL message of any kind. In other words, even though multiple methods may be listed, if a
FAIL (i.e., deny) message is received from the first method tried, the authentication process stops
and no further authentication methods are attempted in the list. In addition, it is also important to
know that if all entries are processed without receiving a PASS message, access is denied. In AAA
implementation, there are two basic types of method lists, named and default, as described below:
•
A named method list can be configured for any AAA service, such as authentication or authorization, for example. These methods are applied to specific interfaces or even terminal
lines (e.g., console and VTY), as required by the administrator.
•
The default method list is configured globally and applied to all interfaces and VTY lines on
the device if no other method list is defined. However, if a defined (named) method list is
configured, it will take precedence over the default method list.
In order to configure AAA services, the following general sequence of steps should be taken:
1. First, globally enable AAA services using the aaa new-model global configuration command.
This is a mandatory requirement when configuring AAA.
2. Next, configure the security protocol parameters, such as the IP address and shared key of the
TACACS+ and RADIUS server, via the aaa group server [radius|tacacs+] [group_
name] global configuration command for TACACS+ or RADIUS server groups. Alterna-
tively, you can configure individual TACACS+ or RADIUS servers by using the tacacsserver host [address|hostname] key [shared_key] or the radius-server host
[address|hostname] key [shared_key] global configuration commands, respectively.
3. Next, define the authentication service and method list via the aaa authentication global
configuration command and then apply the specified authentication named method list to
324
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
device interfaces or VTY lines by using the login authentication interface or line configuration commands.
4. Next, define authorization method list(s) by using the aaa authorization global configuration command, and then apply the authorization method list(s) to device VTY lines via the
authorization line configuration command. Authorization can also be enabled for WAN
interfaces using protocols such as PPP by using the ppp authorization interface configuration command.
5. Finally, define the accounting service and method lists by using the aaa accounting global
configuration command and apply the accounting method list(s) to VTY lines via the accounting line configuration command. Accounting can also be enabled for WAN interfaces
using protocols such as PPP by using the ppp accounting interface configuration command.
NOTE: Named (defined) method lists must be configured on the security server; they are
simply applied (as configured on the server) to the NAS. You cannot configure method list
parameters directly on the NAS.
While delving into advanced AAA configuration is beyond the scope of the current TSHOOT
certification exam, you are still expected to be familiar with basic AAA configuration. The sections
that follow provide some basic and common AAA configuration examples, along with the explanations of those configurations.
In the first example, authentication will be configured on the device for all logins. This will use the
default method list. Authentication will be performed using a TACACS+ server with the IP address
10.1.1.254 and a secret or password of t5shoot!2010. The default method list will be used to authenticate remote access to the device via VTY lines. The configuration is implemented as follows:
R1(config)#aaa new-model
R1(config)#aaa authentication login default group tacacs+
R1(config)#tacacs-server host 10.1.1.254 key t5sh00t!2010
R1(config)#line vty 0 4
R1(config-line)#login authentication default
Referencing the configuration example above, the aaa new-model configuration command enables
the AAA service on the device. Without this command, you cannot enable AAA.
The aaa authentication login default group tacacs+ configuration command specifies
that the default method list be used for authentication using a TACACS+ server. It assumes that the
default method list is configured and active on the TACACS+ server.
325
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The tacacs-server host 10.1.1.254 key t5sh00t!2010 configuration command specifies
the IP address of the TACACS+ server to be used for authentication as well as the password that is
used on the server.
Finally, the login authentication default configuration command specifies that VTY authentication will be performed using the default method list, which is stored on the TACACS+ server
with the IP address 10.1.1.254. Again, this also assumes that the AAA default method is configured
and active on the TACACS+ server.
In the following example, AAA is enabled for the console and VTY lines using method lists
named CONSOLE_AUTH and VTY_AUTH, respectively. The IP address of the RADIUS server
is 10.1.1.254. This server will use a secret of t5h00t!2010. This configuration is implemented on the
device as follows:
R1(config)#aaa new-model
R1(config)#aaa authentication login CONSOLE_AUTH group radius
R1(config)#aaa authentication dot1x VTY_AUTH group radius
R1(config)#radius-server host 10.1.1.254 key t5h00t!2010
R1(config)#line con 0
R1(config-line)#login authentication CONSOLE_AUTH
R1(config-line)#exit
R1(config)#line vty 0 4
R1(config-line)#login authentication VTY_AUTH
R1(config-line)#exit
The third, and final, example illustrates how to configure login authentication using the local device
database. This will be used to authenticate console and AUX port access against the locally configured username and password pair. This configuration is illustrated as follows:
R1(config)#aaa new-model
R1(config)#aaa authentication login default local
R1(config)#username admin secret t5h00t!2010
R1(config)#line con 0
R1(config-line)#login authentication default
R1(config-line)#exit
R1(config)#line aux 0
R1(config-line)#login authentication default
R1(config-line)#exit
The configuration of both authorization and accounting follows the same logic as that used to configure authentication. In Cisco IOS software, authorization is configured using the aaa authorization global configuration command. This configuration can then be applied to any lines (e.g.,
VTY and console) using the authorization line configuration command. In order for authorization to work, authentication must be configured and the AAA client must have successfully authenticated. The options available for authorization in Cisco IOS software are as follows:
326
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
R1(config)#aaa authorization ?
auth-proxy
For Authentication Proxy Services
cache
For AAA cache configuration
commands
For exec (shell) commands
config-commands For configuration mode commands
configuration
For downloading configurations from AAA server
console
For enabling console authorization
exec
For starting an exec (shell)
multicast
For downloading Multicast configurations from AAA server
network
For network services (PPP, SLIP, ARAP)
reverse-access
For reverse access connections
template
Enable template authorization
The following configuration example illustrates how to configure AAA authorization on a device
against a method list named PRIV_15_ONLY. This configuration is applied to the console and VTY
ports on the device.
R1(config)#aaa new-model
R1(config)#aaa authorization commands 15 PRIV-15-ONLY
R1(config)#line con 0
R1(config-line)#authorization commands 15 PRIV-15-ONLY
R1(config-line)#exit
R1(config)#line vty 0 4
R1(config-line)#authorization commands 15 PRIV-15-ONLY
R1(config-line)#exit
In the authorization configuration example above, the aaa authorization commands 15 PRIV15-ONLY configuration command configures authorization for level 15 configuration commands
against a method list named PRIV-15-ONLY. This authorization method list is applied to the console
and VTY ports using the authorization commands 15 PRIV-15-ONLY line configuration command.
Finally, accounting is configured using the Cisco IOS software aaa accounting global configuration command. Accounting is enabled for VTY lines via the accounting line configuration command. As is the case with authentication and authorization, you can use the default method list
or specify a named method list when configuring accounting. The configuration example that follows illustrates how to enable accounting to send start and stop records for EXEC sessions using a
method list named EXEC-ACNTG:
R1(config)#aaa accounting exec EXEC-ACNTG start-stop group radius
R1(config)#radius-server host 192.168.1.254 auth-port 1812 acct-port 1813
R1(config)#line con 0
R1(config-line)#accounting exec EXEC-ACNTG
R1(config-line)#exit
R1(config)#line vty 0 4
R1(config-line)#accounting exec EXEC-ACNTG
R1(config-line)#exit
327
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Troubleshoo ng AAA Services
Unlike other Cisco IOS features, AAA troubleshooting relies primarily on device configuration
verification using the show running-config command and debugging using the debug aaa suite
of commands. Commonly experienced issues include incorrect username and password configuration on the local device or security server, missing username and password pairs on the local device
or security server, and misconfigured authentication lists.
For the most part, you can troubleshoot these issues by verifying the device AAA configuration.
However, it is sometimes necessary to debug the AAA implementation in order to identify and
isolate the root cause. An example of when you might need to debug is in the case that you do not
have access to the AAA server to verify username and password configuration. Another example
might be in the case that the password in the configuration is encrypted (i.e., the username <name>
secret <password> command has been issued) and AAA has been configured to authenticate
against the local database.
The following displays the list of supported AAA debugging subcommands in Cisco IOS 12.4:
R1#debug aaa ?
accounting
administrative
api
attr
authentication
authorization
cache
db
dead-criteria
id
ipc
mlist-ref-count
per-user
pod
protocol
server-ref-count
sg-ref-count
sg-server-selection
subsys
Accounting
Administrative
AAA api events
AAA Attr manager
Authentication
Authorization
Cache activities
AAA DB manager
AAA Dead-Criteria info
AAA Unique Id
AAA IPC
Method list reference counts
Per-user attributes
AAA POD processing
AAA protocol processing
Server handle reference counts
Server group handle reference counts
Server group server selection
AAA subsystem
Following is a sample output of the debug aaa authentication command. The output shows that
the default method list is being used for login authentication against the local database for a user
named ‘administrator.’ The debug output also shows that authentication for enable access is also using the default method list. However, this is being performed by a TACACS+ server and, unlike the
local authentication, this authentication fails. You can use this information to troubleshoot issues
with the TACACS+ server further:
328
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
R1#debug aaa authentication
AAA Authentication debugging is on
*Mar 23 00:25:03.738: AAA/BIND(0000000A): Bind i/f
*Mar 23 00:25:03.738: AAA/AUTHEN/LOGIN (0000000A): Pick method list ‘default’
*Mar 23 00:25:05.974: AAA: parse name=tty66 idb type=-1 tty=-1
*Mar 23 00:25:05.978: AAA: name=tty66 flags=0x11 type=5 shelf=0 slot=0 adapter=0
port=66 channel=0
*Mar 23 00:25:05.978: AAA/MEMORY: create_user (0x84BD69A8) user=’administrator’
ruser=’NULL’ ds0=0 port=’tty66’ rem_addr=’10.0.0.2’ authen_type=ASCII
service=ENABLE priv=15 initial_task_id=’0’, vrf= (id=0)
*Mar 23 00:25:05.978: AAA/AUTHEN/START (3856188155): port=’tty66’ list=’’
action=LOGIN service=ENABLE
*Mar 23 00:25:05.978: AAA/AUTHEN/START (3856188155): using “default” list
*Mar 23 00:25:05.978: AAA/AUTHEN/START (3856188155): Method=tacacs+ (tacacs+)
*Mar 23 00:25:05.978: TAC+: send AUTHEN/START packet ver=192 id=-438779141
*Mar 23 00:25:10.982: AAA/AUTHEN(3856188155): Status=ERROR
*Mar 23 00:25:10.982: AAA/AUTHEN/START (3856188155): no methods left to try
*Mar 23 00:25:10.982: AAA/AUTHEN(3856188155): Status=ERROR
CONTROL PLANE SECURITY AND TROUBLESHOOTING
Troubleshooting at the control plane is dependent on the specific protocol or technology that is being problematic. While generic commands to be used to troubleshoot control plane issues, such as
using the debug ip routing command to troubleshoot routing protocol issues, for the most part,
protocol-specific commands should be used to troubleshoot control plane problems. Because the
troubleshooting of routing protocols, such as EIGRP, OSPF, and BGP, has already been described
earlier in this guide, for brevity and to avoid being repetitive, routing protocol troubleshooting will
not be included in this chapter. The same is also applicable for Spanning Tree Protocol troubleshooting, which was also described in detail earlier in this guide.
Troubleshooting for additional control plane features, such as Dynamic ARP Inspection and DHCP
Snooping, primarily centers on configuration. If the features are not implemented correctly, they
will not work as expected. Therefore, from a troubleshooting perspective, you should ensure that
you understand how these features work. You can then determine whether the feature is not working correctly because of a misconfiguration or other issue, such as software or hardware errors or
bugs, for example.
329
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
FORWARDING PLANE SECURITY AND TROUBLESHOOTING
While the management and control planes are concerned with data that is destined for the local
router or switch, the forwarding or data plane is concerned with data that is traversing the router
or switch. Securing the data plane entails securing the actual flows through the router or switch. In
Cisco IOS software, a plethora of tools can be used to secure the data plane.
Troubleshooting data plane issues depends on the technology that has been implemented to secure
the data plane. As always, a solid understanding of these technologies makes the troubleshooting process that much simpler. The following sections will describe common data plane security
mechanisms, which include the following:
•
Router and switch Access Control Lists
•
Catalyst switch port security
•
Private VLANs
•
IEEE 802.1x Port-based authentication
•
Trunking
•
The Cisco IOS Firewall
Router and Switch Access Control Lists
The forwarding or data plane can be secured by implementing ACLs, which can take the form
of Router ACLs (RACLs), Port ACLs (PACLs), or VLAN ACLs (VACLs) in Cisco IOS Catalyst
switches. Many types of Router ACLs can be configured in Cisco IOS software. These include
named and number standard and extended ACLs. RACLs are commonly applied on interfaces in
either the inbound or the outbound direction using the ip access-group <name|number> interface configuration command. These ACLs are used to filter packets transiting the interfaces on
which they are applied.
When a packet enters a router or a switch, the destination address of the packet is checked against
the entries in the routing table to identify the egress interface. The packet is also checked against
any configured RACLs assigned to the interface, and will be either permitted or denied accordingly.
When an inbound RACL is applied to an interface, the router or switch checks the received packets
against the statements in the RACL looking for a match. If a match is found, and the RACL action
is to permit, then the device continues to process the packet. However, if a match is found, and the
action is to deny, then the device discards the packet and typically, unless otherwise configured,
sends an ICMP Destination Unreachable message back to the source.
When an outbound RACL is applied on an interface, the device first performs a route lookup for
the destination address in the routing table to determine the egress interface via which the packet
330
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
should be forwarded. If a valid path is found in the routing table, and a match is found for the
RACL, and the action of the RACL is to permit, then the device continues to process the packet.
But, if the RACL action is to deny the packet, then the packet is discarded by the device, which then
sends an ICMP Destination Unreachable message back to the source host(s).
However, if a match is not found, the implied ‘deny all’ statement at the end of the RACL is applied
and the router discards the packet, sending the source an ICMP Destination Unreachable message.
Finally, if a valid path to the intended destination is not found in the routing table, then the device
simply discards the packet.
When troubleshooting data forwarding issues on routed interfaces, check configured or applied
RACLs using the show ip access-lists command. If the RACL is long, you can enable logging
by appending the log or log-input keywords. The log keyword allows you to see matches against
the IP ACL, while the log-input keyword provides additional information, such as the Layer 2
address of the hosts that match the RACL entry.
Port ACLs (PACLs) are similar to Router ACLs but are supported and configured on Layer 2 interfaces on a switch. Port ACLs are supported on physical interfaces as well as on EtherChannel
interfaces. PACLs are not supported on private VLANs. In addition, keep in mind that PACLs do
not support the log RACL keyword. Port ACLs perform access control on all traffic entering the
specified Layer 2 port and apply only to ingress traffic on the port.
When implementing PACLs, it is important to remember that they do not affect Layer 2 control
packets, such as CDP packets, that are received on the port. Additionally, keep in mind that PACLs
are supported only in hardware and do not apply to packets that are processed in software. When
you create a Port ACL, an entry is created in the ACL TCAM. PACLs can be configured as either
standard or extended IP ACLs or MAC ACLs. This allows you to filter IP traffic by using IP access
lists and non-IP traffic by using MAC addresses. As is the case with RACLs, you can use the show
ip access-lists command to troubleshoot PACL Layer 3 filtering problems. In the event that
MAC filtering has been implemented, use the show mac access-group command to view applied MAC ACLs on a per-interface basis as follows:
Switch#show mac access-group interface FastEthernet0/1
Interface FastEthernet0/1:
Inbound access-list is TSHOOT
Outbound access-list is not set
To view the configured MAC ACL, use the show access-lists command as follows:
331
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Switch#show access-lists
Extended MAC access list TSHOOT
deny
host 0000.0c92.04b6 any
permit any any
VLAN Access Control Lists (VACLs) operate in a similar manner to Router ACLs but are a means
to apply access control to packets bridged within a VLAN or routed between VLANs. Unlike RACLs, which are applied on an inbound or outbound basis, VACLs have no sense of direction and
therefore apply to traffic at both ingress and egress. Within a VLAN, packets arriving on the Layer
2 interface have the VACL processed on ingress and egress.
Used alone, VACLs can be configured to filter both bridged and routed packets. Additionally, they
can also be used in conjunction with RACLs to filter both bridged and routed traffic. Troubleshoot
VACL issues using the show vlan filter <access-map|vlan> suite of commands. The show
vlan filter access-map <name> command prints information showing the VLAN the specified
VLAN access map has been applied to as illustrated below:
Switch#show vlan filter access-map MY-VACL-MAP
VLAN Map MY-VACL-MAP is filtering VLANs:
2-3
The show vlan filter vlan <ID> command prints information on the VACL that is applied to
the specified VLAN as illustrated in the following output:
Switch#show vlan filter vlan 2
Vlan 2 has filter MY-VACL-MAP.
In addition to the show vlan filter commands, you can also use the show vlan access-map
command to view the configured parameter for all VACLs or for the specified VACL:
Switch#show vlan access-map
Vlan access-map “MY-VACL-MAP”
Match clauses:
ip address: ALLOW-UDP
Action:
forward
Vlan access-map “MY-VACL-MAP”
Match clauses:
Action:
forward
10
20
A common issue when implementing RACLs, PACLs, and VACLs is attempting to apply conflicting ACLs. However, care should be taken when attempting to implement different types of ACLs
on the switch. If a conflicting VACL and PACL configuration is implemented, the switch will log an
error message similar to the following error message:
332
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
*Mar
2 10:27:10.411: %FM-3-CONFLICT: VLAN Map MY-VACL-MAP conflicts with port ACLs
Likewise, if a conflicting RACL and PACL configuration is implemented, the switch will log an error message similar to the following error message:
*Mar
ACLs
2 10:41:01.399: %FM-3-CONFLICT: Port ACL 100 conflicts with input router
Finally, if a conflicting RACL and VACL configuration is implemented, the switch will log an error
message similar to the following error message:
*Mar
2 10:41:01.399: %FM-3-CONFLICT: Port ACL 100 conflicts with VLAN filters
In summation, when implementing RACLs, PACLs, and VACLs, keep the following in mind:
•
RACLs are applied to Layer 3 interfaces, which includes switch SVIs
•
RACLs are applied in the inbound and outbound directions
•
PACLs are applied only on Layer 2 ports
•
PACLs are applied in the inbound direction
•
PACLs do not affect management protocols, such as CDP
•
PACLs can filter based on Layer 2 or Layer 3 addresses
•
VACLs can be applied to VLANs; they cannot be applied to interfaces
•
VACLs can be used to filter both bridged and routed traffic
•
VACLs have no sense of direction and filter inbound and outbound traffic
•
VACLs can be used with RACLs to filter routed and bridged traffic
Catalyst Switch Port Security
Port security is another Cisco IOS software tool that can be used to protect the data plane. This
feature secures the CAM table by limiting the number of MAC addresses that can be learned on a
particular port or interface. With the port security feature, the switch maintains a table that is used
to identify which MAC address (or addresses) can access which local switch port. The primary purpose of the port security feature is to protect against CAM table overflow or MAC address flooding attacks. However, the same feature can also be used to protect against MAC spoofing attacks,
which were described earlier in this chapter in the section on DHCP Snooping.
CAM table overflow or MAC address flooding attacks work by flooding the switch with a large
number of randomly generated invalid source and destination MAC addresses until the CAM table
fills up and the switch is no longer able to accept new entries. In such situations, the switch effectively turns into a hub and simply begins to broadcast all newly received frames to all ports on the
switch, essentially turning the VLAN into one big Broadcast domain.
333
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The primary purpose of CAM table overflow or MAC address flooding attacks is to get the switch
to go into a ‘fail-open’ state, which essentially means that all traffic is flooded or transmitted out of
all ports. In such cases, the attacker is able to capture all data transiting the switch, as they can see
all packets that are being sent by the switch.
The port security feature can be used to specify which specific MAC address is permitted access to
a switch port, as well as to limit the number of MAC addresses that can be supported on a single
switch port. The methods of port security implementation described in this section are as follows:
•
Static secure MAC addresses
•
Dynamic secure MAC addresses
•
Sticky secure MAC addresses
Static secure MAC addresses are statically configured by network administrators and are stored in
the MAC address table, as well as in the switch configuration. When static secure MAC addresses
are assigned to a secure port, the switch will not forward frames that do not have a source MAC
address that matches the configured static secure MAC address or addresses.
Dynamic secure MAC addresses are dynamically learned by the switch and are stored in the MAC
address table. However, unlike static secure MAC addresses, dynamic secure MAC address entries
are removed from the switch when the switch is reloaded or powered down. These addresses must
then be re-learned by the switch when it boots up again.
Sticky secure MAC addresses are a mix of static secure MAC addresses and dynamic secure MAC
addresses. These addresses can be learned dynamically or configured statically and are stored in
the MAC address table, as well as the switch configuration (NVRAM). This means that when the
switch is powered down or rebooted, it will not need to discover the MAC address dynamically
again because it will already be saved in the configuration file.
Once port security has been enabled, administrators can define the actions the switch will take in
the event of a port security violation. Cisco IOS software allows administrators to specify the following four actions to take when a violation occurs:
•
Protect
•
Shutdown (default)
•
Restrict
•
Shutdown VLAN
334
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
The protect option forces the port into a protected port mode. In this mode, all Unicast or Multicast frames with unknown source MAC addresses will simply be discarded by the switch. When the
switch is configured to protect a port, it will not send out a notification when operating in protected
port mode, meaning that administrators would never know when any traffic was prevented by the
switch port operating in this mode.
The shutdown option places a port in an errdisabled state when a port security violation occurs.
The corresponding port LED on the switch is also turned off when a port security violation occurs
and this configured action mode is used. In shutdown mode, the switch sends out an SNMP trap
and a syslog message, and the violation counter is incremented. This is the default action taken
when port security is enabled on an interface.
The restrict option is used to drop packets with unknown MAC addresses when the number of secure MAC addresses reaches the administrator-defined maximum limit for the port. In this mode,
the switch will continue to restrict additional MAC addresses from sending frames until a sufficient
number of secure MAC addresses is removed or the number of maximum allowable addresses is
increased. As is the case with the shutdown option, the switch sends out an SNMP trap and a syslog
message, and the violation counter is incremented.
The shutdown VLAN option is similar to the shutdown option; however, this options shuts down a
VLAN instead of the entire switch port. This configuration could be applied to ports that have more
than one single VLAN assigned to them, such as a voice VLAN and a data VLAN, for example, as
well as to trunk links on the switches.
When troubleshooting port security, it is important to check the configuration that has been implemented by first using the show running-config interface <name> command. As stated earlier in this guide, default port security configuration parameters can cause operational issues with
other features, such as FHRPs (i.e., HSRP, VRRP, and GLBP) because only a single MAC address is
allowed per port. This can be validated via the show port-security interface <name> command as illustrated below:
Switch#show port-security interface FastEthernet0/2
Port Security
: Enabled
Port Status
: Secure-down
Violation Mode
: Shutdown
Aging Time
: 0 mins
Aging Type
: Absolute
SecureStatic Address Aging : Disabled
Maximum MAC Addresses
: 1
Total MAC Addresses
: 0
Configured MAC Addresses
: 0
335
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Sticky MAC Addresses
Last Source Address:Vlan
Security Violation Count
: 0
: 0000.0000.0000:0
: 0
When looking at the output of this command, it is important to understand the information that is
printed by the switch. The Port Status field indicates the operational state of the port (i.e., whether the port is up or down). In the example above, the port is down, which could be due to Layer 1
issues, or because the shutdown command was issued under the port, or because the switchport
port-security command has not been issued under the interface or port.
The Violation Mode field indicates the configuration violation mode. The default mode is shutdown. The Aging Time and Aging Type fields specify the aging time and type parameters. By
default, secure MAC addresses will not be aged out and will remain in the switch MAC table until
the switch is powered off. However, this default behavior may be adjusted by configuring aging values for dynamic and secure static MAC addresses. The valid aging time range is 0 to 1440 minutes.
The aging type specifies how secure addresses are aged. This can be either an absolute value or
following a configured period of inactivity. The absolute mechanism causes the secured MAC addresses on the port to age out after a fixed specified time. All references are flushed from the secure
address list after the specified time and the address must then be relearned on the switch port.
Once relearned, the timer begins again and the process is repeated as often as has been defined in
the configured timer values. This is the default aging type for secure MAC addresses.
The inactivity time, also referred to as the idle time, causes secured MAC addresses on the port to
age out if there is no activity (i.e., frames or data) received from the secure addresses learned on the
port for the specified time period.
The Maximum MAC Addresses field specifies the number of allowed secure MAC addresses per
port. The default is one and the maximum value depends on the switch platform. The Total MAC
Addresses field indicates the current total MAC addresses learned on the port. The Configured
MAC Addresses field specifies the number of statically configured secure addresses on the port. The
Sticky MAC Addresses field specifies the number of sticky secure MAC addresses configured on
the port. The Last Source Address:Vlan field specifies the MAC address of the last secure MAC
address learned on the port. This is applicable only when port security is configured on a trunk link.
Finally, the Security Violation Count field specifies the number of security violations on the
port. To reinforce what has been discussed in this section further, consider the following output:
Switch#show port-security interface FastEthernet0/2
Port Security
: Enabled
Port Status
: Secure-up
Violation Mode
: Restrict
336
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
Aging Time
Aging Type
SecureStatic Address Aging
Maximum MAC Addresses
Total MAC Addresses
Configured MAC Addresses
Sticky MAC Addresses
Last Source Address:Vlan
Security Violation Count
:
:
:
:
:
:
:
:
:
10 mins
Inactivity
Disabled
10
6
1
5
0000.0000.0000:0
0
From the port security interface output that is printed above, we can determine the following:
•
The interface is up, and the switchport port-security command was issued under the
interface. This is reflected in the Secure-up port status.
•
The switchport port-security violation restrict command was issued under the
interface because the default violation mode is Shutdown.
•
The switchport port-security aging time 10 and switchport port-security aging type inactivity commands were issued under the interface because the aging time
default is 0 minutes and the aging type default is absolute.
•
The switchport port-security maximum 10 command was issued under the interface
since by default only one MAC address is permitted when port security is enabled.
•
Referencing the total MAC addresses, we can determine that the switchport port-security mac-address sticky command was issued and specified five secure sticky addresses,
while the switchport port-security mac-address was issued and specified one secure
address because, by default, these addresses are not defined.
•
Finally, we can determine that no security violations have been detected on the interface or
port as the counter still has a value of 0.
Private VLANs
Private VLANs (PVLANs) prevent inter-host communication by providing port-specific security
between adjacent ports within a VLAN across one or more switches. Access ports within PVLANs
are allowed to communicate only with certain designated router ports, which are typically those
connected to the default gateway for the VLAN. Both normal VLANs and private VLANs can coexist on the same switch; however, unlike normal VLANs, private VLANs allow for the segregation
of traffic at Layer 2. This effectively transforms a traditional Broadcast segment into a non-Broadcast multi-access segment.
337
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The private VLAN feature uses three different types of ports: community, isolated, and promiscuous. Community PVLAN ports are logically combined groups of ports in a common community
that can pass traffic among themselves and with promiscuous ports. Ports are separated at Layer 2
from all other interfaces in other communities or isolated ports within their PVLAN.
Isolated PVLAN ports cannot communicate with any other ports within the PVLAN. However,
isolated ports can communicate with promiscuous ports. Traffic from an isolated port can be forwarded only to a promiscuous port and no other port.
Promiscuous PVLAN ports can communicate with any other ports, including community and isolated PVLAN ports. The function of the promiscuous port is to allow traffic between ports in a
community of isolated VLANs. Promiscuous ports can be configured with switch ACLs to define
what traffic can pass between these VLANs. It is important to know that only one promiscuous
port is allowed per PVLAN, and that port serves the community and isolated VLANs within that
PVLAN. Because promiscuous ports can communicate with all other ports, this is the recommended location to place switch ACLs to control traffic between the different types of ports and VLANs.
Isolated and community port traffic can enter or leave switches via trunk links, because trunks support VLANs carrying traffic among isolated, community, and promiscuous ports. Hence, PVLANs
are associated with a separate set of VLANs that are used to enable PVLAN functionality in Cisco
Catalyst switches. The three types of VLANs used in PVLANs are as follows:
1. The primary VLAN
2. Isolated VLAN
3. Community VLAN
Primary VLANs carry traffic from a promiscuous port to isolated, community, and other promiscuous ports within the same primary VLAN. Isolated VLANs carry traffic from isolated ports to a
promiscuous port. Ports in isolated VLANs cannot communicate with any other port in the private
VLAN without going through the promiscuous port.
Community VLANs carry traffic between community ports within the same PVLAN, as well as to
promiscuous ports. Ports within the same community VLAN can communicate with each other
at Layer 2; however, they cannot communicate with ports in other community or isolated VLANs
without going through a promiscuous port. Isolated and community VLANs are typically referred
to as secondary VLANs. A private VLAN, therefore, actually contains three elements, which are as
follows:
338
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
1. The PVLAN itself
2. The secondary VLANs (community and isolated)
3. The promiscuous port
The community VLAN defines a set of ports that can communicate with each other at Layer 2, as long
as they belong to the same community VLAN, but cannot communicate with ports in other community VLANs or isolated VLANs without first going through the promiscuous port. The isolated VLAN
defines a set of ports that cannot communicate with any other port within the PVLAN, either another
community VLAN port or even a port in the same isolated VLAN, at Layer 2. In order to communicate
with ports in either of these VLANs, isolated ports must go through the promiscuous port. Only a single
isolated VLAN per PVLAN is allowed. The promiscuous port forwards traffic between ports in community and/or isolated VLANs. Only one promiscuous port can exist within a single PVLAN; however,
this port can serve all the community and isolated VLANs in the PVLAN. ACLs may be applied to the
promiscuous port to define the traffic that is allowed to pass between these different VLANs.
When troubleshooting PVLANs, first verify that the implemented configuration is correct. For
example, if users are unable to access other devices in their or other VLANs, ensure that they are
using the correct default gateway and the default gateway has been assigned to the promiscuous
port, because this port allows for both intra- and inter-PVLAN communication. In addition to
verifying PVLAN configuration, perform additional basic checks, such as verifying that the ports
are configured and operational, for example.
IEEE 802.1x Port-Based Authen ca on
Identity Based Networking Services (IBNS) provides identity-based network access control and
policy enforcement at the switch port level. The IBNS solution extends network access security
based on the 802.1x technology, Extensible Authentication Protocol (EAP) technologies, and the
Remote Authentication Dial-In User Service (RADIUS) security server service.
IEEE 802.1x is a protocol standard framework for both wired and wireless Local Area Networks
that authenticates users or network devices and provides policy enforcement services at the port
level to provide secure network access control. 802.1x is an IEEE standard for access control and authentication that provides a means for authenticating users who want to gain access to the network
and placing them into a pre-determined VLAN, effectively granting them certain access rights to
the network. In simpler terms, 802.1x mitigates against rogue or unknown devices from gaining
unauthorized access to either the wired or the wireless network.
The 802.1x protocol provides the definition to encapsulate the transport of EAP messages at the
Data Link Layer over any PPP or IEEE 802 media (e.g., Ethernet, FDDI, or Token Ring) through
339
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
the implementation of a port-based network access control to a network device. EAP messages
are communicated between an end device, referred to as a supplicant, and an authenticator, which
can be either a switch or a wireless access point. The authenticator relays the EAP messages to the
authentication server (e.g., a Cisco ACS server) via the RADIUS server protocol.
The three primary components (or roles) in the 802.1x authentication process are as follows:
•
Supplicant or client
•
Authenticator
•
Authentication server
An IEEE 802.1x supplicant or client is simply an 802.1x-compliant device, such as a workstation, a
laptop, or even an IP phone, with software that supports the 802.1x and EAP protocols. The supplicant client sends an authentication request to access the LAN via the connected authenticator
device (e.g., the access switch) using EAP.
An 802.1x authenticator is a device that enforces physical access control to the network based on
the authentication status (i.e., permit or deny) or the supplicant. Examples of an authenticator
would be a switch or a router. The authenticator acts as a proxy and relays information between the
supplicant and the authentication server.
The authenticator receives the identity information from the supplicant via EAP over LAN (EAPOL)
frames, which are verified and then encapsulated into RADIUS protocol format before being forwarded to the authentication server. It is important to remember that the EAP frames are not modified or examined during the encapsulation process, which means that the authentication server
must support EAP within the native frame format. When the authenticator receives frames from
the authentication server, the RADIUS header is removed, leaving only the EAP frame, which is
then encapsulated in the 802.1x format. These frames are then sent back to the supplicant or client.
The authentication server is the database policy software, such as Cisco Secure ACS, that supports
the RADIUS server protocol and performs authentication of the supplicant that is relayed by the
authenticator via the RADIUS client-server model.
The authentication server validates the identity of the client and then notifies the authenticator
whether the client is allowed or denied access to the network. Based on the response from the
authentication server, the authenticator relays this information back to the supplicant. It is important to remember that during the entire authentication process, the authentication server remains
transparent to the client because the supplicant is communicating only with the authenticator. The
340
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
RADIUS protocol with EAP extensions is the only truly compliant, supported authentication server when configuring 802.1x port-based authentication.
When configuring 802.1x authentication, you must enable AAA on the switch or router using the
aaa new-model global configuration command. Next, create or use the default 802.1X authenti-
cation method list and specify RADIUS server information by issuing the aaa authentication
dot1x [method-list|default] group [name|radius] global configuration command.
NOTE: As per Cisco online documentation, when configuring 802.1x authentication, “The
only method that is truly 802.1X-compliant is the group radius method, in which the client
data is validated against a RADIUS authentication server.” Cisco IOS software also allows you
to use the local database and enable password or line passwords for authentication.
Following that, proceed and configure RADIUS server parameters (e.g., keys and ports) via the
radius-server host global configuration command for an individual server or the aaa group
server radius global configuration command for a RADIUS group server. Next, globally enable
IEEE 802.1x authentication on the switch using the dot1x system-auth-control global configuration command.
Finally, enable 802.1x port-based authentication on desired switch ports using the dot1x portcontrol {auto|force-authorized |force-unauthorized} interface configuration com-
mand. The following configuration example illustrates how to configure 802.1x port-based authentication on a switch interface:
Switch(config)#aaa new-model
Switch(config)#aaa authentication dot1x default group radius
Switch(config)#radius-server host 10.1.1.254 auth-port 1812 key t5h00t!2010
Switch(config)#dot1x system-auth-control
Switch(config)#interface range FastEthernet0/23 - 24
Switch(config-if-range)#switchport mode access
Switch(config-if-range)#dot1x port-control auto
Switch(config-if-range)#exit
When troubleshooting 802.1x port-based authentication, troubleshooting targets include the client
(supplicant), authenticator, and authentication server. As a network engineer, your primary focus
would be on the authenticator, as this is the device under your administration on which the portbased authentication configuration is implemented.
When troubleshooting 802.1x authentication issues, verify that 802.1x is enabled on the switch
port by checking the device configuration or by using the show dot1x interface <name> command, the output of which is shown below:
341
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Switch#show dot1x interface FastEthernet0/24
Dot1x Info for FastEthernet0/24
----------------------------------PAE
= AUTHENTICATOR
PortControl
= AUTO
ControlDirection
= Both
HostMode
= SINGLE_HOST
ReAuthentication
= Disabled
QuietPeriod
= 60
ServerTimeout
= 30
SuppTimeout
= 30
ReAuthPeriod
= 3600 (Locally configured)
ReAuthMax
= 2
MaxReq
= 2
TxPeriod
= 30
RateLimitPeriod
= 0
In addition, check that the port authorization state is set to auto. If the port authorization state is
set to force-unauthorized, the port will remain in the unauthorized state, ignoring all attempts
by the client to authenticate. In other words, the switch or router cannot provide authentication
services to the client through the interface or port. Another thing to check is that 802.1x port-based
authentication has been enabled globally on the switch. Again, you can check the configuration,
parsing it for the dot1x system-auth-control configuration statement, or simply use the show
dot1x command to validate this configuration as shown below:
Switch#show dot1x
Sysauthcontrol
Dot1x Protocol Version
Critical Recovery Delay
Critical EAPOL
Enabled
2
100
Disabled
In the event that the configuration on the switch or router appears to be correct, verify that there
is connectivity between the switch or router and the RADIUS server. You can perform a simple IP
connectivity test using the ping utility and additionally use the debug aaa authentication command to perform additional verification, as well as AAA troubleshooting.
Trunking
Trunk links are used to carry traffic from multiple VLANs. By default, the users residing on one
VLAN cannot directly communicate with users in another VLAN without going through a router
or Layer 3 interface (e.g., an SVI). Despite this default operation, attackers can use VLAN hopping attacks to bypass a Layer 3 device in order to communicate directly between VLANs. For this
reason, it is important to ensure that trunk links are secured. The main objective of VLAN hopping
is to compromise a device residing on another VLAN. The two primary methods used to perform
VLAN hopping attacks are as follows:
342
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
1. Switch spoofing
2. Double-tagging
In switch spoofing, the attacker impersonates a switch by emulating ISL or 802.1Q signaling, as well
as Dynamic Trunking Protocol (DTP) signaling. DTP provides switches with the ability to negotiate the trunking method for the trunk link they will establish between themselves.
Double-tagging or double-encapsulated VLAN attacks involve tagging frames with two 802.1Q
tags in order to forward the frames to a different VLAN. The embedded hidden 802.1Q tag inside
the frame allows the frame to traverse a VLAN that the outer 802.1Q tag did not specify. This is a
particularly dangerous attack because it will work even if the trunk port is set to off.
While there are no structured troubleshooting steps for VLAN hopping attacks per-se, there are
some techniques that you should be familiar with and implement in order to prevent such attacks.
These techniques and mitigation methods include the following:
•
Ensure that the native VLAN used on all the trunk ports is different from the VLAN ID of
user access ports. It is best to use a dedicated or isolated VLAN that is specific for each pair
of trunk ports, not the default VLAN.
•
Configure the native VLAN to tag all traffic to prevent the vulnerability of double-tagged
802.1Q frames hopping VLANs. This functionality can be enabled by issuing the vlan dot1q
tag native global configuration command.
•
Disable Dynamic Trunking Protocol on all untrusted ports using the switchport nonegotiate command, effectively preventing automatic trunk configuration.
•
Alternatively, configure the untrusted ports as access ports using the switchport mode access command.
•
Place all unused ports in a common unrouted VLAN that is local to the switch. Use a VLAN
number that is easily recognizable, such as 666, for example.
Cisco IOS Firewall
The Cisco IOS Firewall suite can be used to secure the data plane; however, it has been included in
its own section, as this topic is not included in any of the other current CCNP study guides.
343
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
CISCO IOS FIREWALL FUNDAMENTALS
The final section of this chapter describes the Cisco IOS Firewall suite, which can also be used to protect the data plane. While most network environments typically employ dedicated appliances, such
as the Cisco Adaptive Security Appliance (ASA) Firewall, and a Network-based Intrusion Prevention
System (NIPS), such as the Cisco IPS 4200 Sensors, Cisco IOS software also provides in-built firewall
and intrusion prevention capabilities with which, as a network engineer, you should be familiar.
The Cisco IOS Firewall suite provides a single point of protection at the network perimeter, making security policy enforcement an inherent component of the network. The Cisco IOS Firewall is
comprised of the following functions and technologies:
•
Cisco IOS Stateful Packet Inspection
•
Context-Based Access Control
•
Intrusion Prevention System
•
Authentication Proxy
•
Port-to-Application Mapping
•
Network Address Translation
•
Zone-Based Policy Firewall
NOTE: Keep in mind that this is not a security exam. While it is an expectation that you are
familiar with the basic Cisco IOS Firewall fundamentals, we will not be going into advanced
detail on this feature or other related security features.
Cisco IOS Stateful Packet Inspec on
Cisco IOS Stateful Packet Inspection (SPI) provides firewall capabilities designed to protect networks against unauthorized traffic and to control legitimate business-critical data. Cisco IOS SPI
maintains state information and counters of connections, as well as the total connection rate,
through the firewall and intrusion prevention software.
Stateful firewalls perform SPI or Stateful Inspection and keep track of the state of network connections, such as TCP and UDP streams traveling across them. The Cisco IOS Firewall is a Stateful
firewall that uses the inherent Stateful inspection engine of Cisco IOS software for maintaining the
detailed session database, which is referred to as the state table.
Stateful firewalls are able to hold a significant amount of attributes in their memory for each connection, from start to finish. These attributes, which are known as connection states, may include
such details as the IP addresses and port numbers involved in the connection, as well as the sequence numbers of the packets traversing the connection. When running Cisco IOS Classic Fire-
344
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
wall, which is described later in this chapter, you can use the show ip inspect session command to view the state table. Following is a sample output of this command:
R1#show ip inspect sessions
Established Sessions
Session 84362E94 (10.1.1.1:3624)=>(172.1.1.1:80) http SIS_OPEN
Context-Based Access Control
Context-Based Access Control (CBAC) is a Stateful inspection firewall engine that provides dynamic traffic filtering capabilities. CBAC, which is also known as the Classic IOS Firewall, provides
an advanced firewall engine that provides advanced traffic-filtering functionality to Cisco IOS routers. The main features of Context-Based Access Control are as follows:
•
It protects the internal network from external intrusion or other threats
•
It provides Denial of Service (DoS) protection
•
It provides per-application control mechanisms
•
It examines Layer 3 and Layer 4, as well as Application Layer, information
•
It maintains state information for every connection
•
It generates real-time event alert failures and log messages
•
It provides enhanced audit trail features
Context-Based Access Control inspects all traffic that traverses the firewall and maintains state information for all TCP and UDP sessions. This state information is then used to create temporary
(dynamic) ACL openings through the firewall to allow returning traffic that was originated internally
access. These temporary openings are maintained for the duration of the session. Packets that enter
the firewall are subject to inspection only if they first pass the inbound ACL at the input interface
and outbound ACL at the output interface. If a packet is denied by the ACL, the router will simply
drop it without CBAC inspection. Figure 8-3 below illustrates basic Cisco IOS CBAC operation:
Cisco IOS Stateful Firewall
Internal – Trusted
10.1.1.1 - SRC
External – Untrusted
172.1.1.1 - DST
OUTBOUND (Based on static ACL entry)
SRC IP
SRC Port
DST IP
DST Port
Protocol
10.1.1.1
1500
172.1.1.1
80
TCP
INBOUND (Based on dynamic ACL entry)
Internal – Trusted
10.1.1.1 - DST
SRC IP
SRC Port
DST IP
DST Port
Protocol
172.1.1.1
80
10.1.1.1
1500
TCP
State Table
Fig. 8-3. Understanding CBAC Operation
345
External – Untrusted
172.1.1.1 - SRC
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Referencing Figure 8-3, traffic from the internal (trusted) network is permitted by an ACL that is
configured statically on the router, allowing host 10.1.1.1 to communicate with host 172.1.1.1 on
the external network (untrusted). This information is stored in the state table. CBAC then creates
a dynamic ACL entry, allowing return traffic from external host 172.1.1.1 to internal host 10.1.1.1.
This dynamic ACL is maintained only for the duration of the session. Because a dynamic ACL is
created for return traffic, the firewall allows only traffic originating from the internal network. In
other words, if host 172.1.1.1 simply attempted to initiate a connection to host 10.1.1.1, then the
connection would be dropped.
You can use the show ip inspect sessions detail command to view detailed session inspection when troubleshooting CBAC problems. Following is a sample output of this command, which
illustrates an ICMP Echo request sent by host 10.1.1.1 to 172.1.1.1:
R1#show ip inspect sessions detail
Established Sessions
Session 84362BCC (10.1.1.1:8)=>(172.1.1.1:0) icmp SIS_OPEN
Created 00:00:09, Last heard 00:00:06
ECHO request
Bytes sent (initiator:responder) [128:128]
Out SID 172.1.1.1[0:0]=>10.1.1.1[0:0] on ACL CBAC-ACL
In SID 172.1.1.1[0:0]=>10.1.1.1[0:0] on ACL CBAC-ACL (4 matches)
Out SID 0.0.0.0[0:0]=>10.1.1.1[3:3] on ACL CBAC-ACL
In SID 0.0.0.0[0:0]=>10.1.1.1[3:3] on ACL CBAC-ACL
Out SID 0.0.0.0[0:0]=>10.1.1.1[11:11] on ACL CBAC-ACL
In SID 0.0.0.0[0:0]=>10.1.1.1[11:11] on ACL CBAC-ACL
Referencing the output above, CBAC creates a dynamic ACL entry named CBAC-ACL, which allows return traffic (i.e., the ICMP Echo response) from host 172.1.1.1 to host 10.1.1.1. This dynamic
entry is based on the static ACL applied to the internal or trusted router interface. Another useful
command when troubleshooting CBAC issues is the show ip inspect interfaces command.
This command shows the internal (trusted) and external (untrusted) interfaces. The internal interface should have an inbound CBAC inspection rule applied to it, while the external interface
should have an outbound CBAC inspection rule applied to it. This allows the router to inspect traffic ingressing the trusted interface and destined out of the untrusted interface, which in turn allows
it to create a dynamic entry for the return traffic, as it was originated behind the trusted interface.
Following is a sample output of this command:
R1#show ip inspect interfaces
Interface Configuration
Interface Serial0/0
Inbound inspection rule is not set
Outgoing inspection rule is TSHOOT-CBAC
icmp alert is on audit-trail is off timeout 10
smtp max-data 20000000 alert is on audit-trail is off timeout 3600
346
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
udp alert is on audit-trail is off timeout 30
Inbound access list is CBAC-ACL
Outgoing access list is not set
Interface FastEthernet0/0
Inbound inspection rule is TSHOOT-CBAC
icmp alert is on audit-trail is off timeout 10
smtp max-data 20000000 alert is on audit-trail is off timeout 3600
udp alert is on audit-trail is off timeout 30
Outgoing inspection rule is not set
Inbound access list is not set
Outgoing access list is CBAC-ACL
Referencing the output above, the inbound inspection rule named TSHOOT-CBAC has been applied in the inbound direction to the FastEthernet0/0 interface (trusted), while the same outbound
inspection rule has been applied in the outbound direction to the Serial0/0 interface (untrusted).
An ACL named CBAC-ACL is used to permit or deny traffic.
NOTE: You are not expected to perform any CBAC configuration in the TSHOOT certification exam. In addition, you are not expected to perform any advanced CBAC troubleshooting.
Intrusion Preven on System
The Cisco IOS Intrusion Prevention System (IPS) is an inline intrusion detection and prevention
sensor that scans packets and sessions flowing through the router to identify any of the Cisco IPS
signatures that protect the network from internal and external threats. Some key features of the
Cisco IOS Intrusion Prevention System are as follows:
•
It protects the network from viruses, worms, and a large variety of threats and exploits
•
It eliminates the need for a standalone IPS device
•
It provides integrated inline deep-packet inspection
•
It complements the Cisco IOS Firewall and VPN solutions for superior threat protection
•
It supports approximately 2000 attack signatures
•
It uses Cisco IOS routing capabilities to deliver integrated functionality
•
It enables distributed network-wide threat mitigation
•
It sends a syslog message or an alarm in SDEE format when a threat is detected
NOTE: SDEE (Security Device Event Exchange) specifies the format of messages and protocols used to communicate events generated by security devices. SDEE specifies that events
can be transported using HTTP or HTTPS over SSL and TLS protocols. SDEE is the default
protocol used by Cisco IPS Sensor software, as well as by the Cisco IOS IPS feature set used on
Cisco IOS routers. SDEE can also be used by tools such as Cisco Router and Security Device
Manager (SDM) to pull event logs from Cisco IOS software routers.
347
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
When a Cisco IOS router will be acting as an IPS device, it needs to have a place to store the signature files, referred to as Signature Definition Files (SDFs), that it will use to identify malicious traffic.
An SDF is a file, usually in XML format, that contains signature definitions that can be used to load
signatures on the Cisco IOS router. In most cases, the SDF is located in the router’s Flash Memory;
however, Cisco IOS routers also have the capability to reference multiple Signature Definition Files
located on network servers, such as on TFTP servers, for example, for increased signature coverage.
NOTE: IOS IPS troubleshooting is beyond the scope of the current TSHOOT certification
exam and will not be included in this guide.
Authen ca on Proxy
The Authentication Proxy feature, also known as Proxy Authentication, allows administrators to
enforce security policy on a per-user basis. With this feature, administrators can authenticate and
authorize users on a per-user policy with access control customized to an individual level. The
Authentication Proxy feature intercepts HTTP or HTTPS sessions and prompts the user for a username and password if the user has not been previously authenticated. Authentication Proxy configuration and detailed knowledge is beyond the scope of the TSHOOT requirements and will not
be described in detail in this guide.
Port-to-Applica on Mapping
Port-to-Application Mapping (PAM) allows administrators to customize TCP port numbers and
UDP port numbers for network services or applications to non-standard ports. For example, administrators could use PAM to configure standard HTTP traffic, which uses TCP port 80 by default, to use TCP port 8080. PAM is also used by CBAC, which uses this information to examine
non-standard Application Layer protocols. PAM configuration and troubleshooting is beyond the
scope of the TSHOOT certification exam and will not be described in detail in this guide.
Network Address Transla on
Network Address Translation (NAT) is used to hide internal addresses, which are typically private
address (i.e., RFC 1918 addresses) from networks that are external to the firewall. The primary
purpose of NAT is address conservation for networks that use RFC 1918 addressing due to the
shortage of globally routable IP (i.e., public) address space. NAT provides a lower level of security by
hiding the internal network from the outside world. NAT configuration is described in additional
detail later in this guide.
Zone-Based Policy Firewall
Zone-Based Policy Firewall (ZPF) is a new Cisco IOS Firewall feature designed to replace and address some of the limitations of CBAC, the Classic Firewall. ZPF allows Stateful inspection to be
348
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
applied on a zone-based model, which provides greater granularity, flexibility, scalability, and ease
of use over the Classic Firewall.
ZPF provides greater granularity, flexibility, scalability, as well as an easy-to-use zone-based security approach. With a zone-based inspection model, varying policies can be applied to multiple
groups of hosts connected to the same interface. The security zones used in ZPF establish the
security boundaries of the network where traffic is subjected to policy restrictions as it crosses to
another zone within the network. As is the case with CBAC, the Cisco IOS ZPF configuration is
beyond the scope of the TSHOOT certification exam. For this reason, detailed troubleshooting
steps are also not included in this guide. Instead, when troubleshooting ZPF issues, consider the
following configuration guidelines and caveats:
•
A zone must be configured before interfaces can be assigned to the zone. In other words, an
interface cannot be assigned to a zone that does not exist.
•
An interface can be assigned to only one security zone. This same concept is applicable in
CBAC. Assigning an interface to multiple zones would result in confusing the router.
•
All traffic to and from a given interface is implicitly blocked when the interface is assigned to
a zone, except traffic to and from other interfaces in the same zone, and traffic to any interface
on the router (e.g., Loopback interfaces).
•
Traffic is implicitly allowed to flow, by default, among interfaces that are members of the
same zone. In other words, if two or more interfaces are in the same zone, all hosts connected
to those interfaces can communicate with each other by default.
•
In order to permit traffic to and from a zone-member interface, a policy allowing or inspecting traffic must be configured between that zone and any other zone.
•
The self zone is the only exception to the default ‘deny all’ policy. The self zone controls traffic
sent to the router itself or originated by the router. Therefore, all traffic to any router interface
or traffic originated by the router is allowed until explicitly denied.
•
Traffic cannot flow between a zone member interface and any interface that is not a zone
member, by default. Pass, inspect, and drop actions can be applied only between two configured zones. For example, if interface FastEthernet0/0 is a member of Zone A and interface
FastEthernet0/1 is not affiliated with any zones, traffic from FastEthernet0/0 cannot flow to
FastEthernet0/1, and vice-versa.
349
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Interfaces that have not been assigned to a zone function as classic router ports and can still
use classic Stateful inspection/CBAC configuration. However, interfaces that have been configured for zones cannot be configured for CBAC.
•
If it is required that an interface on the router not be part of the ZBF, it might still be necessary to put that interface in a zone and configure a ‘pass all’ policy, which is sort of a dummy
policy, between that zone and any other zone to which traffic flow is desired. Otherwise, that
interface will not be able to communicate with other interfaces that have been assigned to
zones, and vice-versa, as described earlier.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter:
Cisco IOS Security Fundamentals
•
The communications architecture of all switches and routers is segmented into 3 planes
•
The communications architecture planes of network devices are as follows:
1. The Management Plane
2. The Control Plane
3. The Forwarding Plane
•
The management plane is used to manage a device through its connection to the network
•
The management plane is responsible for management functions
•
The management plane also coordinates functions among all other the planes
•
Management protocols are used for device monitoring and CLI access
•
Management protocols include SNMP, Telnet, HTTP, HTTPS and SSH, NetFlow, Syslog
•
Console access, i.e. via the Console port, is also used to manage devices
•
A control plane is a collection of processes that run at the process level on a route processor
•
All traffic directly or indirectly destined to a router or switch is handled by the control plane
•
Control plane protocols include routing protocols, as well as Layer 2 protocols
•
The forwarding or data plane is responsible for the actual forwarding of data
•
The data plane is typically populated using information derived from the control plane
Management Plane Security and Troubleshoo ng
•
Management plane troubleshooting involves troubleshooting the management protocols
•
SSH provides a more secure method for access and administration than Telnet
•
Cisco IOS software requires a VTY password and login enabled for remote management
•
SSH is enabled by default; however, the following must be configured manually:
350
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
1. A valid domain name must be configured on the local device
2. Security keys must be generated on the local device
•
FTP and SCP require additional configuration to allow file copies and transfers from devices
•
AAA provides the framework that controls and also monitors network access
•
AAA services can be performed using the local database or an external security server
•
Two commonly used AAA security protocols are RADIUS and TACACS+
•
RADIUS stands for Remote Authentication Dial-In User Service
•
RADIUS is an open-standard protocol that is distributed in C source code format
•
RADIUS uses UDP as the Transport layer protocol for client and server communications
•
RADIUS uses port 1812 for authentication and authorization, and 1813 for accounting
•
Earlier deployments of RADIUS use port 1645 for authentication and authorization
•
Earlier deployments of RADIUS use port 1646 for accounting
•
TACACS+ stands for Terminal Access Controller Access Control System Plus
•
TACACS+ is a Cisco-proprietary protocol that is used in the AAA framework
•
TACACS+ uses TCP as a Transport Layer protocol, using TCP port 49
•
Unlike RADIUS, TACACS+ separates the three AAA architectures
•
TACACS+ encrypts data between the user and server; RADIUS encrypts only the password
•
TACACS+ supports multiple protocols; RADIUS had limited protocol support
•
AAA services are based on method lists
•
In AAA, there are two basic types of method lists: named and default method lists
Control Plane Security and Troubleshoo ng
•
Troubleshooting at the control plane is dependent on the specific protocol or technology
•
Troubleshooting for additional control plane features centers on configuration
Forwarding Plane Security and Troubleshoo ng
•
Troubleshooting data plane issues depends on the technology that has been implemented
•
Tools that are used to secure the data plane include the following:
1. Router and Switch Access Control Lists
2. Catalyst Switch Port Security
3. Private VLANs
4. IEEE 802.1x Port-based authentication
5. Trunking
6. The Cisco IOS Firewall
•
Cisco IOS software supports RACLs, PACLs and VACLs on router and switch platforms
•
RACLs are applied to routed interfaces or SVIs
351
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
RACLs can be applied in either the inbound or outbound direction
•
PACLs are similar to RACLs but are applied to Layer 2 ports
•
PACLs do not affect Layer 2 control packets such as Cisco Discovery Protocol
•
PACLs can only be applied in the inbound direction
•
PACLs can take the form of either IP ACLs or MAC ACLs
•
VACLs are similar to RACLs but are applied to VLANs
•
VACLs have no sense of direction
•
VACLs can filter on both routed and bridged traffic
•
Port security mitigates against CAM table overflow or MAC address flooding attacks
•
Private VLANs prevent inter-host communication
•
The three types of VLANs used in PVLANs are as follows:
1. The Primary VLAN
2. Isolated VLAN
3. Community VLAN
•
802.1x simply mitigates against any rogue devices from gaining unauthorized access
•
The three primary components (or roles) in the 802.1x authentication process are as follows:
1. Supplicant or Client
2. Authenticator
3. Authentication Server
•
Trunk links are used to carry traffic from multiple VLANs
•
The main objective of VLAN hopping is to compromise a device residing on another VLAN
•
The two primary methods used to perform VLAN hopping attacks are as follows:
1. Switch spoofing
2. Double-tagging
Cisco IOS Firewall Fundamentals
•
The Cisco IOS Firewall is comprised of the following functions and technologies:
1. Cisco IOS Stateful Packet Inspection
2. Context-Based Access Control
3. Intrusion Prevention System
4. Authentication Proxy
5. Port-to-Application Mapping
6. Network Address Translation
7. Zone-Based Policy Firewall
•
Cisco IOS SPI maintains state information and counters of connections
352
C H A P T E R 8: T RO U B L ES H O OT I N G C I S CO I O S S EC U R I T Y F EAT U R ES
•
CBAC provides dynamic traffic filtering capabilities
•
The main features of Context-Based Access Control (CBAC) are as follows:
1. It protects the internal network from external intrusion or other threats
2. It provides Denial of Service (DoS) protection
3. It provides per-application control mechanisms
4. It examines Layer 3 and Layer 4, as well as Application Layer information
5. It maintains state information for every connection
6. It generates real-time event alert failures and log messages
7. It provides enhanced audit trail features
•
The Cisco IOS IPS is an inline intrusion detection and prevention sensor
•
Some key features of the Cisco IOS Intrusion Prevention System are as follows:
1. It protects the network from viruses, worms and a large variety of threats and exploits
2. It eliminates the need for a standalone IPS device
3. It provides integrated inline deep-packet inspection
4. It complements the Cisco IOS Firewall and VPN solutions for superior threat protection
5. It supports about 2000 attack signatures
6. It uses Cisco IOS routing capabilities to deliver integrated functionality
7. It enables distributed network-wide threat mitigation
8. It sends a Syslog message or an alarm in SDEE format when a threat is detected
•
The Authentication Proxy feature intercepts HTTP or HTTPS sessions
•
PAM allows administrators to customize TCP or UDP ports numbers for network services
•
NAT is used to hide internal addresses from networks that are external to the firewall
•
ZPF allows Stateful inspection to be applied on a zone-based model
•
ZPF provides greater granularity, flexibility, and scalability than that of CBAC
353
CHAPTER 9
Troubleshoo ng Cisco IOS
DHCP and NAT
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
D
ynamic Host Configuration Protocol (DHCP) is used by hosts to get initial configuration information, which includes parameters such as IP address, subnet mask, and default gateway,
upon boot up. Since each host needs an IP address to communicate in an IP network, DHCP eases
the administrative burden of manually configuring each host with an IP address.
Network Address Translation (NAT) enables computers on private networks to access resources on
the Internet or other public network. NAT is an IETF standard that enables a LAN to use one set of
IP addresses for internal traffic, typically private address space as defined in RFC 1918, and another
set of addresses for external traffic, typically publicly registered IP address space. The TSHOOT
certification exam objectives that are covered in this chapter are as follows:
•
Troubleshoot a DHCP client and server solution
•
Troubleshoot NAT
While DHCP and NAT are core CCNA requirements, the current TSHOOT certification exam
requires that you demonstrate not only a basic understanding of these mechanisms but also that
you understand how to troubleshoot them. This chapter will be divided into the following sections:
•
Understanding DHCP
•
Troubleshooting DHCP
•
Understanding NAT
•
Troubleshooting NAT
UNDERSTANDING DHCP
As previously stated, the Dynamic Host Configuration Protocol (DHCP) is used to assign hosts IP
addressing information dynamically, which includes IP address, subnet mask, default gateway, and
additional optional parameters, such as Domain Name Service (DNS) servers, Windows Internet
Name Service (WINS) servers, and Network Time Protocol (NTP) server information. DHCP uses
UDP port 68. Cisco IOS routers and some switches can be configured as both DHCP clients and
DHCP servers.
Client States and Message Exchanges
DHCP is a client/server protocol wherein the server provides the client dynamic addressing information. The server can be a standalone server or a Cisco IOS router or switch that can provide
DHCP server functionality. While clients are typically network hosts, such as workstations, Cisco
IOS routers and switches can also be configured as DHCP clients, allowing them to receive addressing information dynamically from the DHCP server.
356
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
DHCP clients transition through a series of states upon initialization. During these phases, the
clients and servers exchange different messages. Clients transition through the following states:
•
Initializing
•
Selecting
•
Requesting
•
Bound
•
Renewing
•
Rebinding
When a client first boots up, it is in the Initializing state. In this state, the client sends out the DHCPDISCOVER message UDP port 67 (BOOTP server) to the Broadcast address FFFF:FFFF:FFFF
(Layer 2)/255.255.255.255 (Layer 3). Because at this point the client has no IP address, the source
IP address of this Broadcast will be 0.0.0.0.
If a DHCP server exists on the local subnet and is configured and operating correctly, the DHCP
server will hear the Broadcast and respond with a DHCPOFFER message using UDP port 68
(BOOTP client). However, if no DHCP server resides on the local subnet, then a DHCP or BOOTP
Relay is required to forward the DHCPDISCOVER message to a remote DHCP server. If this functionality is not enabled, the client will not be able to communicate with the server.
After the client receives the DHCPOFFER message from the DHCP server, the client then transitions into the selecting state. In the event that multiple DHCP servers have responded with DHCPOFFER messages, the client effectively selects which DHCPOFFER to accept during this state.
Most commonly, the client will accept the message from the first server to respond.
This DHCPOFFER message contains the initial configuration information for the DHCP client.
This information includes parameters such as the IP address, subnet mask, default gateway, and
other additional parameters, such as lease duration, renewal time, domain name, DNS server, and
WINS server information, for example. The server will send the DHCPOFFER to the Broadcast
address but will include the hardware address of the client in the offer, so the client knows that it is
the intended destination.
In the event that the DHCP server is not on the local subnet, the DHCP server will send the DHCPOFFER as a Unicast packet, on UDP port 67, back to the DHCP or BOOTP Relay Agent from
which the DHCPDISCOVER came. The DHCP or BOOTP Relay Agent will then either Broadcast
or Unicast the DHCPOFFER on the local subnet on UDP port 68, depending on the Broadcast flag
set by the client.
357
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
After receiving the DHCPOFFER, the client moves into the Requesting state. In this state, the client
responds to the selected DHCP server (typically the first one it heard from) with a DHCPREQUEST
message, which indicates that it is willing to accept the parameters in the DHCPOFFER message.
The client does not respond to the other DHCPOFFER messages; instead, the DHCP client simply
ignores them, implicitly declining the information received from those servers.
The client identifies the selected server by populating the Server Identifier option field with the
DHCP server’s IP address. The DHCPREQUEST is also a Broadcast, so all DHCP servers that sent
a DHCPOFFER will see the DHCPREQUEST, and each will know whether its DHCPOFFER was
accepted or declined. Any additional configuration options that the client requires will be included
in the options field of the DHCPREQUEST message. Even though the client has been offered an
IP address, it will send the DHCPREQUEST message with a source IP address of 0.0.0.0 because it
has not yet received verification that it is clear to use the address.
When the DHCP server receives the DHCPREQUEST from the client, it acknowledges it by sending the client a DHCPACK. When the client receives this message from the server, it transitions to
the Bound state. The DHCPACK message has a source IP address of the DHCP server, and the destination address is a Broadcast. This message contains all the parameters that the client requested
in the DHCPREQUEST message. Before the DHCP client begins using the new address, the DHCP
client must calculate the time parameters associated with a leased address, which are Lease Time
(LT), Renewal Time (T1), and Rebind Time (T2).
The DHCPACK tells the client that it is free to use the provided address to access the network. After this message has been sent, the DHCP server then stores the lease in the database and uniquely
identifies it using the client identifier and the associated IP address. Both the client and the server
will use this combination of identifiers to refer to the lease. The client identifier is the MAC address
of the device plus the media type. The sequence of messages exchanged between server and client
during this phase is illustrated in Figure 9-1 below:
DHCPDISCOVER
DHCPOFFER
DHCP
Client
DHCPREQUEST
DHCPACK
Fig. 9-1. DHCP Client and Server Message Exchanges
358
DHCP
Server
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
After getting a lease from the DHCP server, the client must renew that lease when one-half of the
lease time has expired. To do so, the client transitions to the renewing state and sends a DHCPREQUEST message to the server that holds the current lease. Upon receiving this message, the server
responds to the client with a DHCPACK message that contains the new lease and any other new
configuration parameters that may have been made since the previous lease. For example, this
might include an updated DNS server IP address.
If the client is unable to reach the server holding the lease, it will attempt to renew the address from
any DHCP server if the original DHCP server has not responded to the renewal requests within a
specified interval.
A client transitions into the Rebinding state if, after it has been allocated addressing information,
it is restarted. In this state, the client will specifically request the previously leased IP address using
a DHCPREQUEST packet, which will still have the source IP address 0.0.0.0 and the destination
Broadcast address 255.255.255.255. If the DHCP server determines that the client can still use the
requested IP address, it will either remain silent or send a DHCPACK for the DHCPREQUEST.
If the server determines that the client cannot use the requested IP address, it will send a DHCPNACK back to the client. The client will then move to the Initializing state and send a DHCPDISCOVER message. The entire process starts again.
NOTE: The DHCPNACK is described in the following section.
Addi onal DHCP Exchanges
In addition to the messages described in the previous section, there are some other additional
DHCP messages that can be sent by the client or the server. These messages include the following:
•
DHCPNAK or DHCPNACK
•
DHCPDECLINE
•
DHCPINFORM
•
DHCPRELEASE
The DHCPNAK or DHCPNACK message is sent by the DHCP server if it is unable to satisfy the
client DHCPREQUEST message. When the client receives a DHCPNAK message, or does not
receive a response to a DHCPREQUEST message, the client restarts the configuration process by
going into the Requesting state. The client will retransmit the DHCPREQUEST at least four times
within 60 seconds before restarting the initializing state.
359
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The DHCPDECLINE message is sent when the client discovers that the IP address provided by
the server in the DHCPACK message is already in use. The client verifies the availability of the
address by sending out ARP requests for the IP address specified in the DHCPACK. If already in
use, the client sends the server a DHCPDECLINE message and restarts the configuration process
by transitioning into the requesting state. This is typically a rare message, as the DHCP server will
typically send out ping packets to ensure that the address provided is available.
The DHCPINFORM message is sent by a client to request additional configuration parameters.
This may be the case when the client has a manually configured IP address but requires additional
information from the DHCP server, such as DNS server information, for example. When a DHCP
server receives a DHCPINFORM message, it responds to the client with a DHCPACK message that
contains the requested configuration parameters without allocating the client a new IP address.
This message is Unicast to the requesting client.
Finally, the DHCPRELEASE message is sent by the client when it wishes to release or give up its IP
address. This action is typically performed manually by an administrator. As an example, a Windows-based client will send this message after the ipconfig /release command is executed on
the command prompt. The client identifies the lease to be released by the use of the Client Identifier
field and network address in the DHCPRELEASE message.
Understanding the DHCP/BOOTP Relay Agent
As was previously stated, Cisco IOS software routers and switches may be configured as DHCP
servers and clients. In addition, Cisco IOS software also supports DHCP or BOOTP relay functionality. As we already know, DHCP uses Broadcast messages. This works well when client and server
reside within the same Broadcast domain; however, it does present a challenge when the DHCP
server is located in a remote subnet. This is because, by default, routers will not forward Broadcast
packets. This essentially means that if a router resides between the client and the server, the DHCP
messages will never be exchanged between the two.
In order to allow clients to communicate with servers on remote subnets, the DHCP or BOOTP
relay agent function must be enabled on the router. When enabled, the relay agent will then forward
requests on behalf of the client to the server, using its own IP address as the source of those requests. This allows the server to allocate an IP address on the same subnet as the messages received
from the relay agent. The DHCP server Unicasts responses to the relay agent.
Configuring A Cisco IOS Router or Switch as a DHCP Client
Cisco IOS DHCP client configuration functionality requires the implementation of only a single
command, which is the ip address dhcp interface configuration command. This command is
360
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
required on the interface that will be receiving configuration information from the DHCP server.
The following example shows how you would configure the router as a DHCP client:
Router(config)#interface FastEthernet0/0
Router(config-if)#description “Connected To ISP XYZ Cable Modem”
Router(config-if)#ip address dhcp
Router(config-if)#end
Assuming that the router is able to communicate with the DHCP server, when the DHCP server
provides the router with the addressing parameters, you will see a message that is similar to the
following printed on the console:
*Oct 20 02:02:10.592 CST: %DHCP-6-ADDRESS_ASSIGN: Interface FastEthernet0/0
assigned DHCP address 150.1.1.1, mask 255.255.255.0, hostname R1
You can validate whether the device has been configured as a DHCP client using either the show
dhcp server or show dhcp lease commands. The show dhcp server command provides infor-
mation on DHCP message statistics, such as the number of offers or acknowledgements received,
for example. It also provides basic addressing parameters, such as DNS server addresses, the domain name, and the subnet mask assigned by the server. Below is a sample output of the information printed by this command:
R1#show dhcp server
DHCP server: ANY (255.255.255.255)
Leases:
1
Offers:
1
Requests: 1
Acks : 1
Naks: 0
Declines: 0
Releases: 0
Query: 0
Bad: 0
DNS0:
172.16.1.253,
DNS1: 172.16.1.254
NBNS0: 172.16.1.254,
NBNS1: 0.0.0.0
Subnet: 255.255.255.0
DNS Domain: howtonetwork.net
The show dhcp lease command provides additional configuration details, which include the assigned IP address, subnet mask, default gateway, and lease duration, among other things. Following
is a sample output of the information that is printed by this command:
R1#show dhcp lease
Temp IP addr: 150.1.1.1 for peer on Interface: FastEthernet0/0
Temp sub net mask: 255.255.255.0
DHCP Lease server: 150.1.1.2, state: 3 Bound
DHCP transaction id: 191F
Lease: 691200 secs, Renewal: 345600 secs, Rebind: 604800 secs
Temp default-gateway addr: 150.1.1.254
Next timer fires after: 3d23h
Retry count: 0
Client-ID: cisco-000c.cea7.f3a0-Fa0/0
Client-ID hex dump: 636973636F2D303030632E636561372E
663361302D4661302F30
Hostname: R1
361
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
NOTE: You can also use the show
ip interface <name> command to determine whether
the interface has derived its IP address from a DHCP server as illustrated in the output below:
R1#show ip interface FastEthernet0/0
FastEthernet0/0 is up, line protocol is up
Internet address is 150.1.1.1/24
Broadcast address is 255.255.255.255
Address determined by DHCP
MTU is 1500 bytes
Helper address is not set
Directed broadcast forwarding is disabled
Outgoing access list is not set
Inbound access list is not set
Proxy ARP is enabled
Local Proxy ARP is disabled
...
[Truncated Output]
Configuring A Cisco IOS Router or Switch as a DHCP Server
While quite straightforward, the configuration of the Cisco IOS DHCP server function requires
more steps than when configuring a router or a switch as a DHCP client. The following sequence of
steps is required when configuring a router or switch as a Cisco IOS DHCP server:
•
Exclude the IP addresses that you do not want the Cisco IOS DHCP server to assign to clients
using the ip dhcp excluded-address <starting address> <ending address> global
configuration command. By default, Cisco IOS DHCP server functionality assumes that all IP
addresses specified in the pool are available for assigning and will begin assigning addresses from
the bottom IP address to the top IP address (e.g., from .1 to .254). This order cannot be changed.
NOTE: By default, the Cisco IOS DHCP server will ping a pool IP address twice before it will
assign it to a client. If the ping is unanswered, the DHCP server will assign the address to a client because it assumes that it is available. However, while this does minimize the probability
of duplicate addresses being assigned to a client, keep in mind that some devices (e.g., servers)
residing on the subnet may have a firewall running that blocks ping packets. Therefore, it is
quite possible that a client could be assigned an address already manually assigned to another
such device because the Cisco IOS DHCP server did not receive a response from the device.
It is therefore recommended that all statically assigned addresses are excluded from the pool.
•
Configure the DHCP pool using the ip dhcp pool <name> global configuration command.
Each individual DHCP pool must have a unique name. The device then transitions to DHCP
pool configuration mode.
362
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
•
In DHCP pool configuration mode, next configure the network number and mask of the
DHCP address pool using the network <network> <mask> or network <network>
/<prefix-length> DHCP pool configuration command. Both options are acceptable
and both perform the same function.
•
In DHCP pool configuration mode, specify the IP address of the default gateway using the
default-router <address 1…address 8> DHCP pool configuration command. You can
specify up to eight different addresses in a single configuration line.
Following this core configuration, you can configure additional parameters, such as DNS servers,
WINS servers, domain name, and lease duration, for example. In DHCP pool configuration mode,
you can specify DNS servers for the pool using the dns-server <address 1…address 8> DHCP
pool configuration command. You can specify up to eight different addresses in a single configuration line. The WINS server information can be specified by issuing the netbios-name-server
<address 1…address 8> DHCP pool configuration command. Again, you can specify up to eight
different addresses in a single configuration line. The domain name for the client can be specified
using the domain-name <name> DHCP pool configuration command. Finally, you can change the
default one day lease duration used by Cisco IOS DHCP servers via the lease <days [hours]
[minutes]|infinite> DHCP pool configuration command. If the infinite keyword is speci-
fied, the lease for that pool will never expire.
The following configuration example illustrates how to configure two DHCP pools on the same
router. The configuration parameters for the first pool are illustrated below:
Excluded Address(es)
Pool Name
Subnet/Mask
Gateway(s)
DNS Server(s)
WINS Server(s)
Lease
10.1.1.1 – 10.1.1.9
POOL-A
10.1.1.0/24
10.1.1.1
172.16.1.252, 172.16.1.253, 172.16.1.254
172.16.1.253, 172.16.1.254
8 days
The configuration parameters for the second pool are illustrated below:
Excluded Address(es)
Pool Name
Subnet/Mask
Gateway(s)
DNS Server(s)
WINS Server(s)
Lease
10.2.2.1
POOL-B
10.2.2.0/29
10.2.2.1
172.16.1.254
N/A
8 hours
363
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The configuration for these two pools is implemented on the router as follows:
R1(config)#ip dhcp excluded-address 10.1.1.1 10.1.1.9
R1(config)#ip dhcp excluded-address 10.2.2.1
R1(config)#ip dhcp pool POOL-A
R1(dhcp-config)#network 10.1.1.0 /24
R1(dhcp-config)#default-router 10.1.1.1
R1(dhcp-config)#dns-server 172.16.1.252 172.16.1.253 172.16.1.254
R1(dhcp-config)#netbios-name-server 172.16.1.253 172.16.1.254
R1(dhcp-config)#lease 8 0 0
R1(dhcp-config)#exit
R1(config)#ip dhcp pool POOL-B
R1(dhcp-config)#network 10.2.2.0 255.255.255.248
R1(dhcp-config)#default-router 10.2.2.1
R1(dhcp-config)#dns-server 172.16.1.254
R1(dhcp-config)#lease 0 8 0
R1(dhcp-config)#exit
Following this configuration, you can also use the show ip dhcp pool command to view the configured DHCP pool parameters, as well as pool address allocation, as follows:
R1#show ip dhcp pool
Pool POOL-A :
Utilization mark (high/low)
: 100 / 0
Subnet size (first/next)
: 0 / 0
Total addresses
: 254
Leased addresses
: 2
Pending event
: none
1 subnet is currently in the pool :
Current index
IP address range
10.1.1.1
10.1.1.1
- 10.1.1.254
Leased addresses
2
Pool POOL-B :
Utilization mark (high/low)
: 100 / 0
Subnet size (first/next)
: 0 / 0
Total addresses
: 6
Leased addresses
: 0
Pending event
: none
1 subnet is currently in the pool :
Current index
IP address range
10.2.2.1
10.2.2.1
- 10.2.2.6
Leased addresses
0
Additionally, you can use the show ip dhcp binding command to view the DHCP binding database on the local device. This command prints information that includes the DHCP client IP and
hardware (MAC) addresses, as well as the lease expiration time and date, as illustrated in the following output:
R1#show ip dhcp binding
Bindings from all pools not associated with VRF:
IP address
Client-ID/
Lease expiration
364
Type
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
10.1.1.10
10.1.1.11
Hardware address/
User name
0063.6973.636f.2d30.
3030.632e.6365.6137.
2e66.3361.302d.4661.
302f.30
0100.24e8.f57e.a2
Oct 28 2010 02:03 AM
Automatic
Oct 28 2010 02:12 AM
Automatic
Impor ng DHCP Op ons
In some cases, a router or switch may be configured as both a client and a server. This is common
when the device has a Broadband connection (e.g., DSL or cable) and is also providing addressing
information to hosts connected to the LAN. The Cisco IOS DHCP server import and autoconfiguration feature is enabled by issuing the import all DHCP pool configuration command. When
this command is issued, the pool under which it is configured will import the DHCP option parameters into the DHCP server database. These options include parameters such as the DNS and
WINS server information, as well as the domain name.
The following configuration example illustrates how to configure a router that is acting as both a
DHCP client and server to import DHCP option parameters into the DHCP pool:
R1(config)#ip dhcp excluded-address 10.3.3.1 10.1.1.5
R1(config)#ip dhcp pool POOL-C
R1(dhcp-config)#network 10.3.3.0 255.255.255.0
R1(dhcp-config)#default-router 10.3.3.1
R1(dhcp-config)#import all
R1(dhcp-config)#exit
R1(config)#interface FastEthernet0/0
R1(config-if)#description ‘Connected To Internal LAN’
R1(config-if)#ip address 10.3.3.1 255.255.255.0
R1(config-if)#exit
R1(config)#interface FastEthernet0/1
R1(config-if)#description ‘Connected To The ISP’
R1(config-if)#ip address dhcp
R1(config-if)#exit
This configuration can be validated using the show ip dhcp import command as follows:
R1#show ip dhcp import
Address Pool Name: POOL-C
Domain Name Server(s): 172.16.1.253 172.16.1.254
NetBIOS Name Server(s): 172.16.1.252
Domain Name Option: howtonetwork.net
These imported parameters are then passed on to the client’s assigned addressing information from
the configured local pool named POOL-C.
365
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Configuring A Cisco IOS Router or Switch as a DHCP Relay Agent
Cisco IOS software routers and switches can be configured as DHCP relay agents, allowing the
hosts connected to the local LANs they serve to acquire addressing information from remote
DHCP servers. This functionality is enabled via the ip helper-address <address> interface
configuration command under the inside or internal interface, which resides on the same subnet
as the hosts on the local network. In other words, this command should be configured under the
interface that will be receiving the Broadcasts from the host. You can specify this command and
specify multiple server addresses for high availability. In the event that multiple servers are specified, Cisco routers forward the DHCPDISCOVER message to all the helper addresses configured
under the interface. These messages are Unicast to the servers.
By default, the ip helper-address will forward the following UDP Broadcasts:
•
Trivial File Transfer Protocol (TFTP) (port 69)
•
DNS (port 53), time service (port 37)
•
NetBIOS name server (port 137)
•
NetBIOS datagram server (port 138)
•
Boot Protocol (DHCP/BOOTP) client and server datagrams (ports 67 and 68)
•
Terminal Access Control Access Control System (TACACS) service (port 49)
•
IEN-116 name service (port 42)
However, this command can be configured to forward any UDP Broadcast based on UDP port
number. While supported, it should be noted, however, that this is not recommended, as forwarding Broadcasts from one subnet to the Broadcast address of another subnet increases Broadcast
flooding, which can have an adverse impact on network and device performance.
While the ip helper-address command will forward the default list of UDP Broadcasts listed in
the previous section, you can also use the ip forward-protocol global configuration command
to modify the UDP Broadcasts that the router or switch will forward. This command can be used to
remove certain Broadcasts or even include others that are not forwarded, by default. For example,
assume you wanted to forward BOOTP/DHCP Broadcasts only and no others to the specified servers. In this case, configure the device as follows:
R1(config)#no ip forward-protocol udp 69
R1(config)#no ip forward-protocol udp 37
R1(config)#no ip forward-protocol udp 137
R1(config)#no ip forward-protocol udp 138
R1(config)#no ip forward-protocol udp 49
R1(config)#no ip forward-protocol udp 42
R1(config)#interface FastEthernet0/0
R1(config-if)#ip helper-address 172.17.1.254
366
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
The configuration above prevents the router from forwarding all other default Broadcasts, except
for the Boot Protocol (DHCP/BOOTP) client and server datagrams, which use UDP ports 67 and
68. Additionally, the ip forward-protocol command can be used to forward additional Broadcasts in addition to the default ports. For example, to configure the router to forward UDP port
1812 in addition to the default ports, you would issue the following configuration:
R1(config)#ip forward-protocol udp 1812
R1(config)#interface FastEthernet0/0
R1(config-if)#ip helper-address 172.17.1.254
You can verify the helper addresses specified under the interface by either checking the device configuration or using the show ip interfaces <name> command as follows:
R1#show ip interface FastEthernet0/0
FastEthernet0/0 is up, line protocol is up
Internet address is 150.1.1.3/24
Broadcast address is 255.255.255.255
Address determined by DHCP
MTU is 1500 bytes
Helper addresses are 172.16.1.254
172.17.1.254
172.18.1.254
Directed broadcast forwarding is disabled
Outgoing access list is not set
...
[Truncated Output]
Additionally, you can use the show ip helper-address <interface> command to view all configured helper addresses under a specific interface, or under all interfaces on the device, if you do
not include the interface argument as illustrated in the following output:
R1#show ip helper-address
Interface
FastEthernet0/0
FastEthernet0/1
Helper-Address
172.16.1.254
172.17.1.254
172.18.1.254
172.20.1.254
172.21.1.254
172.22.1.254
367
VPN
0
0
0
0
0
0
VRG Name
None
None
None
None
None
None
VRG State
Unknown
Unknown
Unknown
Unknown
Unknown
Unknown
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
TROUBLESHOOTING DHCP
There are numerous reasons that can result in DHCP problems. However, as with any other issue,
the most common reasons are due to device misconfigurations. However, additional problems can
also be caused by any of the following:
•
NIC compatibility issue or DHCP feature issue
•
Faulty NIC or improper NIC driver installation
•
Operating System behavior or defect
•
Spanning Tree issues
•
CDP is disabled on ports connected to IP phones
•
DHCP server/DHCP relay information option incompatibility
•
Cisco IOS DHCP/BOOTP software bugs
Device misconfigurations are the most common reason for DHCP problems. Common errors that
may result in DHCP problems include the following:
•
The Cisco IOS DHCP service is disabled
•
Using secondary addressing
•
Wrong DHCP configuration parameters
By default, the Cisco IOS DHCP service is enabled via the service dhcp global configuration
command. However, it is possible that the service may have been disabled when disabling unused
services if the device was not previously used as a DHCP server. If you have verified the device
configuration and everything appears to be correct, but the Cisco IOS DHCP server is not leasing
addresses, check the configuration to ensure that the DHCP service is enabled.
Secondary addressing is also another common reason for DHCP problems. By default, DHCP has
a limitation in that the Reply packets are sent only if the Request packet is received from the interface configured with the primary IP address. If you have secondary subnets assigned to a router
interface and a Cisco IOS DHCP server, or a Cisco IOS DHCH relay agent is configured to forward
DHCP Broadcasts to a remote server, only addresses from the pool included in the primary network will be assigned to requesting clients. If you need to assign addresses to hosts from different
subnets, you must configure additional interfaces or use subinterfaces instead.
Incorrect DHCP configuration parameters may include incorrect network statements when configuring the DHCP pool. In the event that the network statement does not include any local device
interfaces, no addresses will be assigned, even by the local Cisco IOS DHCP server. Consider the
following basic configuration for example:
368
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
R1(config)#interface FastEthernet0/0
R1(config-if)#ip add 192.168.1.1 255.255.255.0
R1(config-if)#exit
R1(config)#ip dhcp pool CLIENT-POOL
R1(dhcp-config)#network 192.168.2.0 255.255.255.0
R1(dhcp-config)#default-router 192.168.1.1
R1(dhcp-config)#lease 8 0 0
R1(dhcp-config)#dns-server 10.0.0.254
R1(dhcp-config)#netbios-name-server 10.0.0.254
R1(dhcp-config)#exit
In the configuration example above, the FastEthernet0/0 subnet and the DHCP network statement
are mismatched. Because of this misconfiguration, the Cisco IOS DHCP server will not be able to
assign addresses to local clients sending DHCP packets on FastEthernet0/0. This can be validated
by debugging the Cisco IOS DHCP server functionality using the debug ip dhcp server events
and debug ip dhcp server packet commands as follows:
R1#debug ip dhcp server events
DHCP server event debugging is on.
R1#debug ip dhcp server packet
R1#debug ip dhcp server packet
DHCP server packet debugging is on.
R1#
R1#
R1#
*Oct 20 16:13:13.828: DHCPD: Sending notification of DISCOVER:
*Oct 20 16:13:13.828:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Oct 20 16:13:13.828:
DHCPD: remote id 020a0000c0a8010100000000
*Oct 20 16:13:13.828:
DHCPD: circuit id 00000000
*Oct 20 16:13:13.828: DHCPD: DHCPDISCOVER received from client 0100.24e8.
f57e.a2 on interface FastEthernet0/0.
*Oct 20 16:13:13.828: DHCPD: Seeing if there is an internally specified pool
class:
*Oct 20 16:13:13.828:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Oct 20 16:13:13.828:
DHCPD: remote id 020a0000c0a8010100000000
*Oct 20 16:13:13.832:
DHCPD: circuit id 00000000
*Oct 20 16:13:13.832: DHCPD: there is no address pool for 192.168.1.1.
*Oct 20 16:13:18.828: DHCPD: Sending notification of DISCOVER:
*Oct 20 16:13:18.828:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Oct 20 16:13:18.828:
DHCPD: remote id 020a0000c0a8010100000000
*Oct 20 16:13:18.828:
DHCPD: circuit id 00000000
*Oct 20 16:13:18.828: DHCPD: DHCPDISCOVER received from client 0100.24e8.
f57e.a2 on interface FastEthernet0/0.
*Oct 20 16:13:18.828: DHCPD: Seeing if there is an internally specified pool
class:
*Oct 20 16:13:18.828:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Oct 20 16:13:18.828:
DHCPD: remote id 020a0000c0a8010100000000
*Oct 20 16:13:18.832:
DHCPD: circuit id 00000000
*Oct 20 16:13:18.832: DHCPD: there is no address pool for 192.168.1.1.
369
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
*Oct 20
*Oct 20
*Oct 20
*Oct 20
*Oct 20
f57e.a2
*Oct 20
class:
*Oct 20
*Oct 20
*Oct 20
R1#
*Oct 20
16:13:26.829: DHCPD: Sending notification of DISCOVER:
16:13:26.829:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
16:13:26.829:
DHCPD: remote id 020a0000c0a8010100000000
16:13:26.829:
DHCPD: circuit id 00000000
16:13:26.829: DHCPD: DHCPDISCOVER received from client 0100.24e8.
on interface FastEthernet0/0.
16:13:26.829: DHCPD: Seeing if there is an internally specified pool
16:13:26.829:
16:13:26.829:
16:13:26.829:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
DHCPD: remote id 020a0000c0a8010100000000
DHCPD: circuit id 00000000
16:13:26.829: DHCPD: there is no address pool for 192.168.1.1.
When forwarding DHCP messages to a remote server (i.e., the ip helper-address command has
been issued), the gateway forwards all DHCP messages to the configured helper using the primary
address on the interface. If a pool has not been configured that includes the source address of the
DHCP relay, no addresses will be assigned by the remote DHCP server. For example, if the remote
DHCP server were a Cisco IOS DHCP server, and you enabled DHCP server functionality debugging using the debug ip dhcp server events and debug ip dhcp server packet commands,
then you would see the following messages printed on the console:
R2#debug ip dhcp server events
DHCP server event debugging is on.
R2#debug ip dhcp server packet
DHCP server packet debugging is on.
R2#
R2#
*Mar 3 11:14:18.830: DHCPD: DHCPDISCOVER received from client 0100.24e8.f57e.a2
through relay 192.168.1.1.
*Mar 3 11:14:18.830: DHCPD: Seeing if there is an internally specified pool
class:
*Mar 3 11:14:18.830:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Mar 3 11:14:18.830:
DHCPD: circuit id 00000000
*Mar 3 11:14:18.830: DHCPD: there is no address pool for 192.168.1.1.
*Mar 3 11:14:23.366: DHCPD: Sending notification of DISCOVER:
*Mar 3 11:14:23.366:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Mar 3 11:14:23.366:
DHCPD: circuit id 00000000
*Mar 3 11:14:23.370: DHCPD: DHCPDISCOVER received from client 0100.24e8.f57e.a2
through relay 192.168.1.1.
*Mar 3 11:14:23.370: DHCPD: Seeing if there is an internally specified pool
class:
*Mar 3 11:14:23.370:
DHCPD: htype 1 chaddr 0024.e8f5.7ea2
*Mar 3 11:14:23.370:
DHCPD: circuit id 00000000
*Mar 3 11:14:23.370: DHCPD: there is no address pool for 192.168.1.1.
The troubleshooting of NIC issues is platform and hardware dependent. For this reason, we will not
delve into any specific details on NIC troubleshooting. However, you can isolate NIC issues by re-
370
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
setting the NIC card, re-installing the driver, or verifying whether the NIC is configured to acquire
an IP address automatically, for example. Figure 9-2 below illustrates how you would validate this
configuration when using a Windows-based machine:
Fig. 9-2. Verifying NIC Settings when Using DHCP
You could also perform additional troubleshooting functions, such as assigning a static IP address,
verifying whether the local host has network connectivity, for example. As is the case with NIC
issues, Operating System behaviors or defects are platform independent and require different troubleshooting commands, depending on the platform.
By default, all ports transition through several Spanning Tree Protocol states before they begin forwarding user data. The default amount of time before the port transitions into the forwarding state
may cause some hosts to fail to receive dynamic addressing via DHCP due to NIC timeouts. The
recommended solution in such cases is to enable the PortFast feature on access ports connected to
hosts that will be getting addressing information via DHCP.
When an IP phone is connected to the port, assuming correct Multi-VLAN Access Port (MVAP)
configuration, if Cisco Discovery Protocol (CDP) is disabled on the switch port, while the host con-
371
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
nected to the phone LAN port will still be assigned an IP address by the DHCP server, the IP phone
will be unable to acquire an address from the voice VLAN. If the CDP is enabled, the switch is able
to detect that the Cisco IP phone requests the DHCP server and can provide the correct subnet information. The DHCP server is then able to allot an IP address from the voice VLAN/subnet pool.
Keep in mind that there are no explicit configuration steps required to bind the DHCP service to
the voice VLAN.
When configuring Cisco IOS DHCP relay agent functionality on Catalyst switches, the switch will
include the DHCP relay agent information option (option 82) in packets it sends to the DHCP
server. This option allows the switch to include information about itself when forwarding clientoriginated DHCP packets to a DHCP server. While a Cisco IOS DHCP server will accept such
packets, it may be incompatible with other DHCP servers. In such cases, consider disabling this
behavior on the switch using the no ip dhcp snooping information option global configuration command.
Finally, while rare, it is possible to run into Cisco IOS DHCP software bugs. For example, in older
Cisco IOS software images (i.e., 12.1), enabling the Unicast Reverse Path Forwarding (uRPF) feature, which was described in the previous chapter, causes the router to drop packets with a source
address of 0.0.0.0 and a destination address of 255.255.255.255. This software defect is resolved in
current Cisco IOS images. If you have performed the necessary checks (i.e., verified the router or
switch configuration, and host and server settings [if an external DHCP server is being used]) and
you suspect a software bug, then you should contact the Cisco TAC for additional troubleshooting
assistance and verification.
In addition to the debug commands as well as other verifications described in this section, you can
also use the show ip dhcp server statistics command to troubleshoot possible Cisco IOS
DHCP server problems when Cisco IOS DHCP server functionality is enabled on the device. This
command provides statistics on the DHCP messages described in the previous section. Following
is a sample output of the message statistics that is printed by this command.
R1#show ip dhcp server statistics
Memory usage
40343
Address pools
1
Database agents
0
Automatic bindings
1
Manual bindings
0
Expired bindings
0
Malformed messages
0
Secure arp entries
0
Message
BOOTREQUEST
Received
0
372
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
DHCPDISCOVER
DHCPREQUEST
DHCPDECLINE
DHCPRELEASE
DHCPINFORM
19
2
0
0
2
Message
BOOTREPLY
DHCPOFFER
DHCPACK
DHCPNAK
Sent
0
1
4
0
UNDERSTANDING NAT
Network Address Translation (NAT) enables hosts on private networks to access resources on the
Internet or other public networks. NAT is an IETF standard that enables a LAN to use one set of
IP addresses for internal traffic, typically private address space as defined in RFC 1918, and another
set of addresses for external traffic, typically publicly registered IP address space. NAT converts the
packet headers for incoming and outgoing traffic and keeps track of each session. NAT offers the
dual functions of security and address conservation, and is typically implemented in remote-access
environments. The key to understanding and, ultimately, troubleshooting NAT problems is having
a solid understanding of NAT terminology. You should be familiar with the following NAT terms:
•
The NAT inside interface
•
Inside local address
•
Inside global address
•
The NAT outside interface
•
Outside local address
•
Outside global address
NOTE: NAT is a core CCNA requirement. You can find additional detailed information on
NAT in the current CCNA study guide, which is available online at www.howtonetwork.net.
In NAT terminology, the inside interface is the border interface of the administrative domain controlled by the organization. This does not necessarily have to be the default gateway used by hosts
that reside within the internal network.
The inside local address is the IP address of a host residing on the inside network. In most cases,
the inside local address is an RFC 1918 address. This address is translated to the outside global address, which is typically an IP address from a publically assigned or registered pool. It is important
to remember, however, that the inside local address could also be a public address.
373
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The inside global address is the IP address of an internal host as it appears to the outside world.
Once the inside IP address has been translated, it will appear as an inside global address to the Internet public or any other external network or host.
The outside interface is the boundary for the administrative domain that is not controlled by the
organization. In other words, the outside interface is connected to the external network, which may
be the Internet or any other external network, such as a partner network, for example. Any hosts
residing beyond the outside interface fall outside the local organization’s administration.
The outside local address is the IP address of an outside, or external, host as it appears to inside
hosts. Finally, the outside global address is an address that is legal and can be used on the Internet.
Both outside local addresses and outside global addresses are typically allocated from a globally
routable address or network space.
To clarify these concepts further, Figure 9-3 below shows the use of the addresses in a session between two hosts. NAT is enabled on the intermediate gateway:
SRC Address
DST Address
SRC Address
DST Address
Inside Local
Outside Local
Inside Global
Outside Global
Flow of Traffic
Host 1
INSIDE
Host 2
OUTSIDE
Fig. 9-3. Understanding NAT Inside and Outside Addresses
Configuring and Verifying NAT in Cisco IOS So ware
As you already know, the configuration and verification of Network Address Translation in Cisco
IOS software is a straightforward task. When configuring NAT, perform the following:
•
Designate one or more interfaces as the internal (inside) interface(s) using the ip nat inside interface configuration command.
•
Designate an interface as the external (outside) interface using the ip nat outside interface
configuration command.
•
Configure an Access Control List that will match all traffic that is to be NATed. This can be a
standard or an extended named or numbered ACL.
374
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
•
Optionally, configure a pool of global addresses using the ip nat pool <name> <startip> <end-ip> [netmask <mask> | prefix-length <length>] global configuration
command. This defines a pool of inside global addresses to which inside local addresses will
be translated.
•
Configure NAT globally using the ip nat inside source list <ACL> [interface|pool]
<name> [overload] global configuration command.
The following example shows how to configure basic NAT in Cisco IOS software:
R1(config)#interface FastEthernet0/0
R1(config-if)#description ‘Connected To The Internal LAN’
R1(config-if)#ip address 10.5.5.1 255.255.255.248
R1(config-if)#ip nat inside
R1(config-if)#exit
R1(config)#interface Serial0/0
R1(config-if)#description ‘Connected To The ISP’
R1(config-if)#ip address 150.1.1.1 255.255.255.248
R1(config-if)#ip nat outside
R1(config-if)#exit
R1(config)#access-list 100 remark ‘Translate Internal Addresses Only’
R1(config)#access-list 100 permit ip 10.5.5.0 0.0.0.7 any
R1(config)#ip nat pool INSIDE-POOL 150.1.1.3 150.1.1.6 prefix-length 24
R1(config)#ip nat inside source list 100 pool INSIDE-POOL
R1(config)#exit
Following this configuration, the show ip nat translations command can be used to verify that
translations are actually taking place on the router as illustrated below:
R1#show ip nat translations
Pro Inside global
Inside local
icmp 150.1.1.4:4
10.5.5.1:4
icmp 150.1.1.3:1
10.5.5.2:1
tcp 150.1.1.5:15594
10.5.5.3:15594
Outside local
200.1.1.1:4
200.1.1.1:1
200.1.1.1:23
Outside global
200.1.1.1:4
200.1.1.1:1
200.1.1.1:23
TROUBLESHOOTING NAT
Before we delve into NAT troubleshooting, it is important to understand that while NAT provides the advantage of allowing private networks to communicate, it also has numerous limitations,
which include the following:
•
Breaking the end-to-end IP model
•
The need to maintain Connection state issues
375
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
The inhabitation of end-to-end security
•
Applications that are not NAT-friendly
•
Address space collision
•
Ratio of internal and reachable IP addresses
IP (in general) was designed so that only network endpoints (i.e., hosts and servers) handle the
connection. However, when NAT is implemented, one or more intermediate NAT-enabled devices
must terminate and re-originate each session that is translated. This breaks the end-to-end model
of IP and may cause issues with some applications and protocols.
When NAT is used, NAT-enabled devices must maintain the connection state for each of the translations being performed. Depending on the number of translations, this can consume a significant
amount of device resources, such as memory, and result in poor performance. In addition, because
all packets must be processed, NAT can hinder network and device performance by introducing
latency due to device processing delays. This results in longer round-trip times between source and
destination hosts, which could cause severe performance problems for real-time applications such
as voice and video.
NAT can also cause security headaches in identifying the sources of network breaches when traffic is coming from a NATed location. By hiding the true identity of the owner, NAT can make it
very difficult to identify the true origin or address of an attacker. Another inhibition to end-to-end
security is the incompatibility of NAT with cryptography and encryption. When some security algorithms, such as IP Security (IPSec), are employed, they cannot be used in conjunction with NAT
because NAT changes the source address of packets before they are forwarded to their destination.
This change causes the cryptography method to fail because it thinks that the packet has been tampered with along the way.
Not all applications work in the same manner. This means that some applications, for example,
applications that use proprietary protocols, etc., are not compatible with NAT. The two most common issues experienced with NAT are on applications that use embedded IP addresses or port
numbers. When NAT changes these port numbers or IP addresses, as is the case with IPSec, for
example, the applications do not function as expected or, in some instances, cease functioning
completely. This means that while NAT can be used to provide privately addressed devices access
to public networks, it cannot be used in all instances.
Address space collision typically occurs when two or more organizations merge. Because RFC 1918
address space is commonly used internally in different organizations, with NAT enabling these privately addressed networks to communicate with the outside world, it is not uncommon for two orga-
376
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
nizations that merge to find out that they have the same address space. If re-IP addressing the network
is not an option, and it usually is not because of the complexity involved, then ‘double-NAT’ is typically used to allow organizations with overlapping address space to communicate with each other.
However, the downside to this is the additional complexity that is introduced into the environment.
Finally, NAT works well when there are a few internal hosts that must be accessed from external
networks, such as the Internet. However, NAT can become an issue when multiple hosts need to
be accessed from the Internet. Given the greatly depleted IPv4 address space, acquiring a large
number of public addresses is not always possible. Because this is not always possible, it highlights
another limitation of IPv4.
As is the case with DHCP troubleshooting, the most common sources of problems when using
NAT are due to device misconfigurations. Common misconfigurations include the following:
•
Misconfigured NAT Access Control Lists
•
Asymmetric routing
•
Misconfigured NAT address pool
•
Router interfaces missing NAT commands
Misconfigured NAT ACLs are a common cause of NAT problems. You can use the show ip nat
statistics command to determine the ACL that is used for translations, followed by the show
ip access-lists command to verify that the ACL is configured correctly. Following is a sample
output of the information printed by the show ip nat statistics command (highlighting the
ACL used for NAT):
R1#show ip nat statistics
Total active translations: 2 (0 static, 2 dynamic; 0 extended)
Outside interfaces:
Serial0/0
Inside interfaces:
FastEthernet0/0
Hits: 182 Misses: 2
CEF Translated packets: 83, CEF Punted packets: 4
Expired translations: 42
Dynamic mappings:
-- Inside Source
[Id: 1] access-list 100 pool INSIDE-POOL refcount 2
pool INSIDE-POOL: netmask 255.255.255.0
start 150.1.1.3 end 150.1.1.6
type generic, total addresses 4, allocated 2 (50%), misses 0
Appl doors: 0
Normal doors: 0
Queued Packets: 0
377
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Asymmetric routing can also cause NAT problems, wherein some protocols work and others do
not. Consider the topology in Figure 9-4 below, for example:
Lo0: 150.1.1.254
Fa0/0: 192.1.1.254
R1
Fa0/0: 172.1.1.254
Router IP Routing Table
Fa0/0: 192.1.1.1
Fa0/0: 172.1.1.1
200.1.1.0/24 via 192.1.1.1
Router IP Routing Table
192.1.1.0/24 via 200.1.1.2
172.1.1.0/24 via 201.1.1.2
150.1.1.0/24 via 201.1.1.2
Se0/0: 200.1.1.2
Se0/1: 201.1.1.1/24
Se0/0: 200.1.1.1/24
Static NAT Xlation
200.1.1.5 > 10.1.1.5
Se0/1: 201.1.1.2
R6
Fa0/0: 10.1.1.1/24
Fa0/0: 10.1.1.5/24
R7
Fig. 9-4. Asymmetric Routing and NAT
Referencing Figure 9-4, NAT is enabled on R6. The inside interface (FastEthernet0/0) has been assigned the IP address 10.1.1.1/24. The outside interface (Serial0/0) has been assigned the IP address
200.1.1.1/24. NAT is not enabled on the Serial0/1 interface. The routing table of R6 is illustrated in
the diagram. This router uses the path via neighbor 200.1.1.2 to reach the 192.1.1.0/24 subnet. However, R6 prefers the path via neighbor 201.1.1.2 to reach the 172.1.1.0/24 and 150.1.1.0/24 subnets.
R1 uses the path via neighbor 192.1.1.1 to reach the 200.1.1.0/24 subnet. On R6, NAT has been configured to translate the address 200.1.1.5 to internal address 10.1.1.5, which is the address assigned
to the FastEthernet0/0 interface of R7.
Next, R1 initiates a Telnet session to 200.1.1.5 (R7). This session is sourced from the Loopback 0
interface of R1 (150.1.1.254). The TCP SYN packet is sent via neighbor 192.1.1.1 and arrives at the
Serial0/0 interface of R6. R6 translates the packet, which has a destination address of 200.1.1.5, to
378
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
internal address 10.1.1.5 and forwards the packet out of its LAN interface to R7. R7 sends a SYN
ACK response to 150.1.1.254. This packet hits R6, which sends it out of Serial0/1, based on the
routing table entries.
Because NAT is not enabled on Serial0/1, the source address of 10.1.1.5 is not changed. The packet
is received by R1 with a source address of 10.1.1.5 and a destination address of 150.1.1.254. R1 is not
aware of any TCP sessions to this source address and issues an RST to 200.1.1.5. The TCP session
is terminated before it establishes. While this has been described as a NAT issue, the actual recommended solution would be to correct the asymmetric routing solution because NAT is operating in
the manner it should be. Once the asymmetric routing issue has been resolved, R1 will be able to
establish a Telnet session to R7.
Another common misconfiguration is an inadequate number of addresses in the NAT pool. When
configuring NAT, there must be a one-to-one correlation between the public pool addresses and
internal host addresses. If this is not the case, some host addresses will not be translated, as the
public pool will run out of addresses. In situations where the number of internal addresses exceeds
the number of pool addresses, you should consider implementing Port Address Translation (PAT)
instead. PAT allows multiple private IP addresses to a single public IP address by using different
ports. This functionality is enabled by appending the overload keyword to the ip nat inside
source list <ACL> [interface|pool] <name> [overload] global configuration command
in Cisco IOS software.
Finally, another common misconfiguration that also causes NAT problems is incorrectly designating interfaces as inside or outside, or even forgetting to designate the correct device interfaces as
inside or outside. This is typically an issue on devices with multiple interfaces. Verify that the correct NAT configuration has been implemented by parsing the device configuration, or by using the
show ip nat statistics command, as illustrated below:
R1#show ip nat statistics
Total active translations: 2 (0 static, 2 dynamic; 0 extended)
Outside interfaces:
Serial0/0
Inside interfaces:
FastEthernet0/0
Hits: 182 Misses: 2
CEF Translated packets: 83, CEF Punted packets: 4
Expired translations: 42
Dynamic mappings:
-- Inside Source
[Id: 1] access-list 100 pool INSIDE-POOL refcount 2
pool INSIDE-POOL: netmask 255.255.255.0
start 150.1.1.3 end 150.1.1.6
379
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
type generic, total addresses 4, allocated 2 (50%), misses 0
Appl doors: 0
Normal doors: 0
Queued Packets: 0
In addition to device misconfigurations, the following problems can also result in NAT issues:
•
Layer 1 and Layer 2 issues
•
Layer 3 issues
•
Device resource depletion
•
Incompatible applications
Layer 1 and Layer 2 issues can cause issues such as intermittent connectivity, even when NAT is
configured correctly on the device. Verify that everything at these layers is operating correctly using
the appropriate commands.
As was described earlier in this section, Layer 3 issues, such as asymmetric routing, can and often
do cause problems when NAT is implemented. In environments where multiple paths between
source and destination exist, verify that routing is symmetric; otherwise, NAT will not function as
expected. In addition to routing, also check for traffic filtering at Layer 3. This, too, can break NAT.
Verify both local and transient device filtering using the appropriate suite of commands.
In some cases, the size of the NAT table can significantly increase, which consumes many resources
(e.g., memory) available on the device. When this happens, the %NAT: System busy. Try later
error message is printed on the console when a show command related to NAT or a show running-config or write memory command is executed. You can avoid this issue and protect the
device using the ip nat translation max-entries <number> global configuration command,
which specifies the maximum number of NAT entries that are permitted in the NAT table of the
router or switch.
Finally, as was stated earlier in this chapter, not all applications are NAT-friendly. In some cases, enabling NAT will break some applications and protocols (e.g., IPSec). It is important to understand
the applications and protocols used in the network prior to implementing NAT; otherwise, you
could spend endless hours troubleshooting something that will never work with Network Address
Translation in the first place. In addition to IPSec, additional protocols that can also be affected by
NAT, because they use embedded IP addresses, include FTP, Internet Relay Chat (IRC), Simple
Network Management Protocol (SNMP), H.323, Lightweight Directory Access Protocol (LDAP),
and Session Initiation Protocol (SIP).
380
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
NOTE: Cisco IOS software supports a feature called NAT Transparency, which addresses the
vast majority of incompatibilities between NAT and IPSec. However, it should be noted that it
does not resolve all possible incompatibilities. NAT Transparency is beyond the scope of the
TSHOOT certification exam and will not be described in any additional detail in this chapter
or in this guide.
In conclusion, given that NAT is a resource-hungry application, you should consider using the
show commands described in this chapter when troubleshooting NAT. This includes using the
show running-config command to verify device configurations, as well as the show ip nat
translation and show ip nat statistics commands to verify NAT operation. However, there
may be situations where you have used these commands and still cannot resolve the NAT issue. In
such cases, you can (with caution) use debugging commands to aid your troubleshooting efforts.
Cisco IOS software supports the following options when debugging Network Address Translation.
It should be noted that the options displayed below are based on Cisco IOS software version 12.4.
Options will differ with other versions:
R1#debug ip
<1-99>
detailed
fragment
generic
h323
ipsec
nvi
port
pptp
route
sip
skinny
vrf
wlan-nat
<cr>
nat ?
Access list
NAT detailed events
NAT fragment events
NAT generic ALG handler events
NAT H.323 events
NAT IPSec events
NVI events
NAT PORT events
NAT PPTP events
NAT Static route events
NAT SIP events
NAT skinny events
NAT VRF events
WLAN NAT events
It should be noted that the majority of these options are beyond the scope of the TSHOOT certification exam. Commonly used debug commands include the debug ip nat <ACL> command,
which restricts output to the address(es) permitted in the specified standard ACL; the debug ip
nat command, which displays information about each packet the device translates; and the debug
ip nat detailed command, which prints detailed information about each translated packet, in-
cluding additional information such as protocol type and port numbers. The following is a sample
output of the detailed information that is printed by the debug ip nat detailed command:
R1#debug ip nat detailed
IP NAT detailed debugging is on
R1#
381
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R1#
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
21
21
21
21
21
21
21
13:20:27.498:
13:20:27.498:
13:20:27.498:
13:20:27.498:
13:20:27.498:
13:20:28.504:
13:20:28.508:
NAT*:
NAT*:
NAT*:
NAT*:
NAT*:
NAT*:
NAT*:
i: icmp (10.5.5.2, 1) -> (200.1.1.1, 1) [12314]
i: icmp (10.5.5.2, 1) -> (200.1.1.1, 1) [12314]
s=10.5.5.2->150.1.1.3, d=200.1.1.1 [12314]
o: icmp (200.1.1.1, 1) -> (150.1.1.3, 1) [12314]
s=200.1.1.1, d=150.1.1.3->10.5.5.2 [12314]
i: icmp (10.5.5.2, 1) -> (200.1.1.1, 1) [12327]
s=10.5.5.2->150.1.1.3, d=200.1.1.1 [12327]
...
[Truncated Output]
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
Understanding DHCP
•
DHCP is used to dynamically assign hosts with IP addressing information
•
DHCP can provide IP address, mask, default gateway, and DNS servers
•
DHCP uses UDP port 68
•
Cisco IOS routers and some switches can be configured as DHCP clients and servers
•
DHCP is a client / server protocol
•
DHCP clients transition through a series of states upon initialization
•
DHCP clients transition through the following states:
1. Initializing
2. Selecting
3. Requesting
4. Bound
5. Renewing
6. Rebinding
•
DHCP clients send out a DHCPDISCOVER message during the initializing phase
•
DHCP servers respond with a DHCPOFFER message
•
DHCP clients transition to the selecting state after the DHCPOFFER is received
•
DHCP clients then send a DHCPREQUEST and transition to the requesting phase
•
DHCP servers respond with a DHCPACK message
•
After the DHCPACK is received, the client transitions to the bound state
•
Additional DHCP messages that can be sent by the client or server include the following:
1. DHCPNAK or DHCPNACK
2. DHCPDECLINE
382
C H A P T E R 9: T RO U B L ES H O OT I N G C I S CO I O S D H C P A N D N AT
3. DHCPINFORM
4. DHCPRELEASE
•
Cisco IOS devices can be configured as DHCP / BOOTP Relay Agents
•
DHCP / BOOTP Relay Agents forward DHCP messages to servers
•
Messages between Relay Agents and servers are Unicast between the two
•
By default, the ip helper-address will forward the following UDP Broadcasts:
1. Trivial File Transfer Protocol (TFTP) (port 69)
2. DNS (port 53), time service (port 37)
3. NetBIOS name server (port 137)
4. NetBIOS datagram server (port 138)
5. Boot Protocol (DHCP/BOOTP) client and server datagrams (ports 67 and 68)
6. Terminal Access Control Access Control System (TACACS) service (port 49)
7. IEN-116 name service (port 42)
Troubleshoo ng DHCP
•
The most common cause of problems with DHCP is device misconfigurations
•
Additional issues that can cause DHCP problems include the following:
1. NIC compatibility issue or DHCP feature issue
2. Faulty NIC or improper NIC driver installation
3. Operating System behavior or defect
4. Spanning Tree issues
5. CDP Is Disabled On Ports Connected to IP Phones
6. DHCP Server DHCP Relay Information Option Incompatibility
7. Cisco IOS DHCP/BOOTP Software Bugs
Understanding NAT
•
NAT enables hosts on private networks to access resources on the Internet / public networks
•
NAT is an IETF standard
•
NAT converts packet headers for incoming /outgoing traffic and keeps track of each session
•
The inside interface is the border interface of the organization administrative domain
•
The inside local address is the IP address of a host residing on the inside network
•
The inside global address is the IP address of an internal host as it appears to the outside
•
The outside interface is the boundary for the administrative domain
•
The outside local address is the address of an outside host as it appears to inside hosts
•
The outside global address is an address that is legal and can be used on the Internet
383
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Troubleshoo ng NAT
•
NAT has several limitations, which include the following:
1. Breaking the end-to-end IP model
2. The need to maintain connection state issues
3. The inhabitation of end-to-end security
4. Applications that are not NAT-friendly
5. Address space collision
6. Ratio of internal and reachable IP addresses
•
Most NAT problems can be attributed to general device misconfigurations
•
Common misconfigurations include the following:
1. Misconfigured NAT Access Control Lists
2. Asymmetric Routing
3. Misconfigured NAT Address Pool
4. Router Interfaces Missing NAT Commands
•
In addition to misconfigurations, the following problems can also result in NAT issues:
1. Layer 1 and Layer 2 Issues
2. Layer 3 Issues
3. Device Resource Depletion
4. Incompatible Applications
384
CHAPTER 10
Troubleshoo ng IPv6
Rou ng & Interoperability
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
U
p until this point, all practical troubleshooting topics covered in this guide have been based on
the current IPv4 standard. In this chapter, we will look at the same basic principles; however,
the routed protocol in this case will be IP version 6 (IPv6). The TSHOOT certification exam objectives that are covered in this chapter are as follows:
•
Troubleshoot IPv6 routing
•
Troubleshoot IPv6 and IPv4 interoperability
The TSHOOT certification exam requires that you not only have a solid understanding of IPv6
routing, routing protocols, and IPv6 integration with IPv4 networks but also understand how to
support IPv6 internetworks. This includes troubleshooting IPv6 routing, routing protocols, and
the mechanisms used to integrate IPv6 and IPv4 networks. For the most part, IPv6 routing and
routing protocol troubleshooting can be performed using the methods discussed for IPv4 routing
protocols.
Naturally, however, there are some differences between IPv6 and IPv4 routing protocols, with the
majority of those being how they are implemented in the CLI. For this reason, this chapter will first
discuss basic IPv6 routing and routing protocol configuration, as well as IPv4 and IPv6 integration
mechanisms, prior to discussing how to troubleshoot IPv6 internetwork problems. This chapter
will be divided into the following sections:
•
IP version 6 Protocol Overview and Fundamentals
•
Understanding and Troubleshooting EIGRPv6
•
Understanding and Troubleshooting RIPng
•
Understanding and Troubleshooting OSPFv3
•
Troubleshooting IPv6 Route Redistribution
•
IPv4 and IPv6 Interoperability
•
Troubleshooting IPv4 and IPv6 Interoperability
NOTE: It should be noted that this chapter will cover basic IPv6 fundamentals and principles.
Additional detailed information on IPv6 can be found in the ROUTE study guide, which is currently available online. Please refer to that guide for additional information.
386
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
IP VERSION 6 PROTOCOL OVERVIEW AND FUNDAMENTALS
Version 6 of the Internet Protocol provides additional capabilities over the current version 4. In
many ways, version 6 provides several enhancements over the current standard. These include the
following:
•
The simplified IPv6 packet header
•
Larger address space
•
IPv6 addressing hierarchy
•
IPv6 extensibility
•
IPv6 Broadcast elimination
•
Stateless autoconfiguration
•
Integrated mobility
•
Integrated enhanced security
The header fields in an IPv4 packet are very detailed and complete. However, not all fields in the
IPv4 packet header are used or required – for example, the Type of Service field. Other headers,
such as the Checksum, are no longer a necessity, nor are they used because transmission link quality has greatly improved over the years. In contrast, the header fields of an IPv6 packet are much
simpler and contain the bare minimum of information required to route the packet. This allows for
greater routing efficiency with IPv6 than is afforded by IPv4.
IPv6 addresses are 128 bits in length. This extended address length allows for billions of host addresses. This sheer amount of address space eliminates the need to perform Network Address
Translation (NAT) in IPv6 because a global address can be assigned to each individual host. Because a global IP address can be assigned to each individual device (e.g., computers, laptops, and
phones), the Internet reverts to a true end-to-end model when using IPv6.
Because of the much larger address space provided by IPv6, multiple levels of hierarchy can be used
within the IPv6 address space. This allows providers and other organizations to use this hierarchy to
better manage the IPv6 address space based on bit-boundaries. The use of an addressing hierarchy
allows route summarization in IPv6 to be performed in a more organized manner than is currently
performed using the IPv4 address space.
Unlike IPv4, IPv6 has a fixed-size header field and additional header extensions are included to
support new features. These additional headers are outside the standard IPv6 header and are referenced in such a way that all individual internetwork devices can skip the extension if they do not
support it. This reduces the processing overhead of routers routing IPv6 packets.
387
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In IPv6, Address Resolution Protocol (ARP) Broadcasts are replaced by Multicast packets on the
local network segment. This prevents devices that do not need to receive these packets from receiving them, and avoids the problems that Broadcasts cause (e.g., wasting resources and network
performance degradation).
Both IPv4 and IPv6 support stateful autoconfiguration, which allows network hosts to receive their
addressing information from a network server (i.e., via DHCP). In addition to supporting stateful
autoconfiguration, IPv6 also supports stateless autoconfiguration. Stateless autoconfiguration allows hosts to configure their Unicast IPv6 addresses by themselves based on prefix advertisements
from routers on the local network segment.
While Mobile IP is available for both IPv4 and IPv6, it is built into IPv6, whereas it is an added
function in IPv4. IPv6 mobility allows IPv6-capable devices such as PDAs, cell phones, and wireless
laptops to roam between the IPv6 networks of wireless or cellular providers by using the Mobile IP
protocol. This allows any IPv6 host to use Mobile IP as needed, while only IPv4 hosts that have this
added functionality can use Mobile IP.
IPv6 uses the inbuilt security mechanisms afforded by the IP Security (IPSec) protocol. The key
difference between IPSec in IPv4 and IPv6 is that it is optional in IPv4 but is mandatory in IPv6. As
defined in RFC 2460, IPv6 includes the use of the Authentication Header (AH) and Encapsulating
Security Payload (ESP) extension headers in a complete implementation.
IPv6 Address Representa on
The three ways in which IPv6 addresses can be represented are as follows:
1. The preferred or complete address representation or form
2. The compressed representation
3. IPv6 addresses with an embedded IPv4 address
While the preferred form or representation is the most commonly used method for representing
the 128-bit IPv6 address in text format, it is also important to be familiar with the other two methods of IPv6 address representation. These methods are described in the following sections.
The preferred representation for an IPv6 address is the longest format, and is also referred to as the
complete form of an IPv6 address. This format represents all 32 Hexadecimal characters that are
used to form an IPv6 address. This is performed by writing the address as a series of eight 16-bit
Hexadecimal fields, separated by a colon (e.g., 3FFF:1234:ABCD:5678:020C:CEFF:FEA7:F3A0).
388
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
The compressed representation allows IPv6 addresses to be compressed in one of two ways. The
first method allows a double colon (::) to be used to compress consecutive zero values in a valid
IPv6 address for successive 16-bit fields comprised of zeros or for leading zeros in the IPv6 address.
When using this method, it is important to remember that the double colon can be used only once
in an IPv6 address. An example of a compressed IPv6 address would be 3FFE::1/64.
The third representation of an IPv6 address is to use an embedded IPv4 address within the IPv6
address. When an IPv6 address is embedded with an IPv4 address, the first part of the IPv6 address uses the Hexadecimal notation and the remainder of the address is in the traditional dotteddecimal notation used by IPv4 addresses. However, it is permissible to convert the 32-bit dotteddecimal IPv4 address into Hexadecimal notation and embed that into the IPv6 address instead.
The IPv6 address with an embedded IPv4 address is comprised of six fields of 16-bit Hexadecimal
characters and four fields of 8-bit decimal characters. The two kinds of IPv6 addresses that contain
an embedded IPv4 address are as follows:
1. IPv4-compatible IPv6 addresses
2. IPv4-mapped IPv6 addresses
IPv4-compatible IPv6 addresses have the first 96 bits set to 0 and are then followed by the 32-bit
IPv4 address. An example of an IPv4-compatible IPv6 address would be 0000:0000:0000:0000:00
00:0000:172.16.255.1. This same address can then be compressed as 0:0:0:0:0:0:172.16.255.1/128,
or simply as ::172.16.255.1/128. Additionally, it is important to remember that the decimal IPv4
address could be converted to Hexadecimal notation and used to create the IPv4-compatible IPv6
address 0:0:0:0:0:0: AC10:FF01/128, or simply ::AC10:FF01/128.
IPv4-mapped IPv6 addresses have the first 80 bits set to 0, the next 16 bits set to a value of all 1s
(which is FFFF in Hexadecimal notation), and are then followed by the IPv4 dotted-decimal address. An example of an IPv4-mapped IPv6 address would be 0000:0000:0000:0000:0000:FFFF:172.
16.255.1/128. Because it is perfectly legal to represent the address in the compressed form, the same
address could also be written as either 0:0:0:0:0:FFFF:172.16.255.1/128 or ::FFFF:172.16.255.1/128.
Additionally, the IPv4 address also could be converted to Hexadecimal notation, producing the
IPv4-mapped IPv6 address 0:0:0:0:0: FFFF:AC10:FF01/128 or simply ::FFFF:AC10:FF0/128.
The Different IPv6 Address Types
Unlike IPv4, IPv6 does not use Broadcast addresses. Instead, IPv6 supports and uses only the Unicast, Multicast, and Anycast address classes or types also used by IPv4. IPv6 addresses can be classified as any one of the following:
389
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Link-Local addresses
•
Site-Local addresses
•
Aggregate global Unicast addresses
•
Multicast addresses
•
Anycast addresses
•
Loopback addresses
•
Unspecified addresses
IPv6 Link-Local addresses can be used only on the local link (i.e., a shared segment between devices) and are automatically assigned to each interface when IPv6 is enabled on that interface.
These addresses are assigned from the Link-Local prefix FE80::/10. To complete the address, bits 11
through 64 are set to 0 and the interface Extended Unique Identifier 64 (EUI-64) is appended to the
Link-Local address as the low-order 64 bits. The EUI-64 is comprised of the 24-bit manufacturer
ID assigned by the IEEE and the 40-bit value assigned by that manufacturer.
Site-Local addresses are Unicast addresses that are used only within a site. Unlike Link-Local addresses,
Site-Local addresses must be configured manually on network devices. These addresses are the IPv6
equivalent of the private IPv4 address space defined in RFC 1918 and can be used by organizations
that do not have globally routable IPv6 address space. While still supported in Cisco IOS software,
Site-Local addresses are deprecated by RFC 4193, which describes Unique-Local Addresses (ULAs),
and serve the same function as Site-Local addresses, so are not routable on the global IPv6 Internet.
Unique-Local Addresses are assigned from the FC00::/7 IPv6 address block, which is then also further
divided into two /8 address groups, referred to as the assigned and random groups. These two
groups are the FC00::/8 and FD00::/8 IPv6 address blocks. The FC00::/8 block is managed by an
allocation authority for /48 blocks in use, while the FD00::/8 is formed by appending a randomly
generated 40-bit string to derive a valid /48 block.
Aggregate global Unicast addresses are the IPv6 addresses used for generic IPv6 traffic, as well as
for the IPv6 Internet. These are similar to the public addresses used in IPv4. From a network addressing point of view, each IPv6 global Unicast address is comprised of three main sections: the
prefix received from the provider (48-bit in length), the site prefix (16-bit in length), and the host
portion (64-bit in length). This makes the 128-bit address used in IPv6. Aggregate global Unicast
addresses for IPv6 are assigned by IANA and fall within the IPv6 prefix 2000::/3. This allows for a
range of aggregate global Unicast addresses, from 2000 to 3FFF.
The Multicast addresses used in IPv6 are derived from the FF00::/8 IPv6 prefix. In IPv6, Multicast
operates in a different manner than that of Multicast in IPv4. There are two defined types of IPv6
390
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
Multicast addresses: permanent and temporary. Permanent IPv6 Multicast addresses are assigned by
IANA, while the temporary IPv6 Multicast addresses can be used in pre-deployment Multicast testing.
In IPv6, Anycast addresses use global Unicast, Site-Local, or even Link-Local addresses. However, there is an Anycast address reserved for special use. This special address is referred to as
the Subnet-Router Anycast Address. The Subnet-Router Anycast Address is formed with the subnet’s 64-bit Unicast prefix, with the remaining 64 bits set to zero, for example: 2001:1a2b:1111:d
7e5:0000:0000:000:0000.
IPv6 Loopback addresses can be represented as 0000:0000:0000:0000:0000:0000:0000:0001 in the
preferred address format and can use the prefix ::1. This means that in Loopback addresses, all bits
are set to 0 except for the last bit, which is always set to 1. These addresses are always assigned automatically when IPv6 is enabled on a device and therefore can never be changed.
In IPv6 addressing, unspecified addresses are simply Unicast addresses that are not assigned to any
interface. These addresses indicate the absence of an IPv6 address and are used for special purposes
that include IPv6 Dynamic Host Configuration Protocol (DHCP) and Directory Access Diagnostics (DAD). Unspecified addresses are represented by all 0 values in the IPv6 address and can be
written using the :: prefix. In the preferred format, these addresses are represented as 0000:0000:0
000:0000:0000:0000:0000:0000.
Verifying and Troubleshoo ng Generic IPv6 Rou ng
As is the case with IPv4, Cisco IOS software provides several IPv6-specific commands that can be
used to verify and troubleshoot generic IPv6 routing configurations and problems. When running
IPv4, the show ip route command, along with supported keywords, is used to view the contents
of the Routing Information Base (RIB). When using IPv6, the same is performed using the show
ipv6 route command. This command supports the following keywords (in Cisco IOS 12.4):
R1#show ipv6 route ?
Hostname or X:X:X:X::X
X:X:X:X::X/<0-128>
bgp
connected
interface
isis
local
ospf
rip
static
summary
|
<cr>
IPv6 name or address
IPv6 prefix
BGP routes
Connected routes
interface-specific routes
IS-IS routes
Local routes
OSPFv3 routes
RIPng routes
Static routes
Summary display
Output modifiers
391
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
While the majority of these supported options require no explanation, as they are described and
illustrated in the ROUTE study guide, there are some options that were not described with which
you should be familiar. The interface keyword can be used to view all routes learned via a particular interface. This command is protocol independent and prints all know routes received via the
specified interface, regardless of routing protocol. The output of this command also includes static
routes pointed out of the specified interface, as well as the connected subnets for that specified
interface. Following is a sample of the information printed by this command:
R1#show ipv6 route interface FastEthernet0/0
IPv6 Routing Table - 7 entries
Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP
U - Per-user Static route
I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary
O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2
ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2
S
::/0 [1/0]
via FE80::20D:28FF:FE9E:F940, FastEthernet0/0
OI 2001::2/128 [110/1]
via FE80::20D:28FF:FE9E:F940, FastEthernet0/0
R
2002::2/128 [120/2]
via FE80::20D:28FF:FE9E:F940, FastEthernet0/0
C
3FFE::/64 [0/0]
via ::, FastEthernet0/0
L
3FFE::20F:23FF:FE5E:EC80/128 [0/0]
via ::, FastEthernet0/0
The show ipv6 route local command prints information on all locally configured aggregate
global Unicast addresses and subnets; however, keep in mind that this does not include assigned
Link-Local addresses. You can view assigned Link-Local addresses using the show ipv6 interface [brief] command instead. Following is a sample of the information that is printed by the
show ipv6 route local command:
R1#show ipv6 route local
IPv6 Routing Table - 7 entries
Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP
U - Per-user Static route
I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary
O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2
ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2
LC 2001::1/128 [0/0]
via ::, Loopback0
LC 2002::1/128 [0/0]
via ::, Loopback1
LC 2003::1/128 [0/0]
via ::, Loopback3
L
3FFE::20D:28FF:FE9E:F940/128 [0/0]
via ::, FastEthernet0/0
392
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
L
L
FE80::/10 [0/0]
via ::, Null0
FF00::/8 [0/0]
via ::, Null0
Finally, the show ipv6 route summary command prints a summary of the number of IPv6 routes
known by the local device. This command shows the total number of routing entries, and further
breaks this down by routing protocol (route source) and the number of prefixes. Following is a
sample output of the information that is printed by this command:
R1#show ipv6 route summary
IPv6 Routing Table Summary - 7 entries
3 local, 1 connected, 1 static, 1 RIP, 0 BGP, 0 IS-IS, 1 OSPF
Number of prefixes:
/0: 1, /8: 1, /10: 1, /64: 1, /128: 3
In addition to basic show commands, you should also be familiar with the debugging suite of commands for generic IPv6 troubleshooting. Cisco IOS software provides a plethora of commands
that can be used to troubleshoot IPv6 problems. IPv6 debugging is enabled using the debug ipv6
privileged EXEC command. This command supports the following options:
R1#debug ipv6 ?
access-list
cef
cpc
dhcp
icmp
inspect
interface
mfib
mld
mobile
mrib
nat
nd
ospf
packet
pim
policy
pool
port-mapping
rip
routing
virtual-reassembly
IPv6 access list debugging
IPv6 CEF information
IPv6 Common Parsing Cache debugging
IPv6 DHCP debugging
ICMPv6 debugging
Stateful inspection events
IPv6 interface debugging
IP Multicast forwarding information base
Multicast Listener Discovery
MIPv6 Debugging
Multicast Route DB
NAT-PT events
IPv6 Neighbor Discovery debugging
OSPF information
IPv6 packet debugging
Protocol Independent Multicast
IPv6 policy-based routing debugging
IPv6 prefix pool debugging
IPv6 PAM events
RIP Routing Protocol debugging
IPv6 routing table debugging
IPv6 Virtual Fragment Reassembly (VFR) debugging
NOTE: While the majority of these options are beyond the scope of the TSHOOT certification
exam, the following section describes some of the options with which you should be familiar.
393
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The debug ipv6 packet <access-list|detail> command enables IPv6 debugging for IPv6
packets. As is the case with the debug ip packet command used when troubleshooting IPv4,
you can restrict the output by specifying an ACL, or you can specify that detailed information be
included, by appending the detail keyword. The following example illustrates how to configure an
IPv6 ACL that permits only ICMPv6 packets and enable detailed IPv6 packet debugging, restricting the output to only ICMPv6 packets:
R1(config)#ipv6 access-list TSHOOT-ICMP-ACL
R1(config-ipv6-acl)#permit icmp any any
R1(config-ipv6-acl)#exit
R1(config)#exit
R1#
R1#debug ipv6 packet access-list TSHOOT-ICMP-ACL detail
IPv6 unicast packet debugging is on (detailed) for access list TSHOOT-ICMP-ACL
R1#
*Mar 23 06:41:23.283: IPv6: Sending on FastEthernet0/0
*Mar 23 06:41:27.091: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.091:
dest 3FFE::20F:23FF:FE5E:EC80
*Mar 23 06:41:27.091:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, forward to ulp
*Mar 23 06:41:27.095: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local)
*Mar 23 06:41:27.095:
dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.095:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, originating
*Mar 23 06:41:27.095: IPv6: Sending on FastEthernet0/0
*Mar 23 06:41:27.099: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.099:
dest 3FFE::20F:23FF:FE5E:EC80
*Mar 23 06:41:27.099:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, forward to ulp
*Mar 23 06:41:27.099: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local)
*Mar 23 06:41:27.099:
dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.099:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, originating
*Mar 23 06:41:27.103: IPv6: Sending on FastEthernet0/0
*Mar 23 06:41:27.103: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.103:
dest 3FFE::20F:23FF:FE5E:EC80
*Mar 23 06:41:27.107:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, forward to ulp
*Mar 23 06:41:27.107: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local)
*Mar 23 06:41:27.107:
dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.107:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, originating
*Mar 23 06:41:27.107: IPv6: Sending on FastEthernet0/0
*Mar 23 06:41:27.111: IPV6: source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.111:
dest 3FFE::20F:23FF:FE5E:EC80
*Mar 23 06:41:27.111:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
64, forward to ulp
*Mar 23 06:41:27.111: IPV6: source 3FFE::20F:23FF:FE5E:EC80 (local)
*Mar 23 06:41:27.111:
dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
*Mar 23 06:41:27.111:
traffic class 0, flow 0x0, len 100+14, prot 58, hops
394
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
64, originating
*Mar 23 06:41:27.115: IPv6:
*Mar 23 06:41:27.115: IPV6:
*Mar 23 06:41:27.115:
*Mar 23 06:41:27.119:
64, forward to ulp
*Mar 23 06:41:27.119: IPV6:
*Mar 23 06:41:27.119:
*Mar 23 06:41:27.119:
64, originating
*Mar 23 06:41:27.119: IPv6:
*Mar 23 06:41:28.671: IPv6:
*Mar 23 06:41:32.095: IPV6:
*Mar 23 06:41:32.095:
*Mar 23 06:41:32.095:
255, originating
*Mar 23 06:41:32.095: IPv6:
*Mar 23 06:41:32.095: IPV6:
*Mar 23 06:41:32.099:
*Mar 23 06:41:32.099:
hops 255, forward to ulp
*Mar 23 06:41:33.283: IPv6:
*Mar 23 06:41:37.095: IPV6:
*Mar 23 06:41:37.095:
*Mar 23 06:41:37.095:
hops 255, forward to ulp
*Mar 23 06:41:37.095: IPV6:
*Mar 23 06:41:37.095:
*Mar 23 06:41:37.095:
hops 255, originating
*Mar 23 06:41:37.099: IPv6:
*Mar 23 06:41:42.099: IPV6:
*Mar 23 06:41:42.099:
*Mar 23 06:41:42.099:
255, originating
*Mar 23 06:41:42.099: IPv6:
*Mar 23 06:41:42.099: IPV6:
*Mar 23 06:41:42.103:
*Mar 23 06:41:42.103:
hops 255, forward to ulp
Sending on FastEthernet0/0
source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
dest 3FFE::20F:23FF:FE5E:EC80
traffic class 0, flow 0x0, len 100+14, prot 58, hops
source 3FFE::20F:23FF:FE5E:EC80 (local)
dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
traffic class 0, flow 0x0, len 100+14, prot 58, hops
Sending on FastEthernet0/0
Sending on FastEthernet0/0
source FE80::20F:23FF:FE5E:EC80 (local)
dest 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
traffic class 224, flow 0x0, len 72+8, prot 58, hops
Sending on FastEthernet0/0
source 3FFE::20D:28FF:FE9E:F940 (FastEthernet0/0)
dest FE80::20F:23FF:FE5E:EC80
traffic class 224, flow 0x0, len 64+14, prot 58,
Sending on FastEthernet0/0
source FE80::20D:28FF:FE9E:F940 (FastEthernet0/0)
dest FE80::20F:23FF:FE5E:EC80
traffic class 224, flow 0x0, len 72+14, prot 58,
source FE80::20F:23FF:FE5E:EC80 (local)
dest FE80::20D:28FF:FE9E:F940 (FastEthernet0/0)
traffic class 224, flow 0x0, len 64+16, prot 58,
Sending on FastEthernet0/0
source FE80::20F:23FF:FE5E:EC80 (local)
dest FE80::20D:28FF:FE9E:F940 (FastEthernet0/0)
traffic class 224, flow 0x0, len 72+8, prot 58, hops
Sending on FastEthernet0/0
source FE80::20D:28FF:FE9E:F940 (FastEthernet0/0)
dest FE80::20F:23FF:FE5E:EC80
traffic class 224, flow 0x0, len 64+14, prot 58,
NOTE: We can determine that these are ICMPv6 packets because of the specified protocol
number of 58, which is the protocol number used by ICMPv6 as specified in RFC 2463.
Another useful troubleshooting command is the debug ipv6 routing command, which can be
used to display information on IPv6 RIB and route cache updates. Again, this command is protocol
independent and prints information for all routing protocols. Following is a sample of the information that is printed by this command:
395
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R1#debug ipv6 routing
IPv6 routing table events debugging is on
R1#
R1#clear ipv6 route *
R1#
R1#
*Mar 23 06:59:49.219: IPv6RT0: ospf 1, Delete 2001::2/128 from table
*Mar 23 06:59:49.219: IPv6RT0: rip TSHOOT, Delete 2002::2/128 from table
*Mar 23 06:59:49.219: IPv6RT0: rip TSHOOT, Delete 2003::2/128 from table
*Mar 23 06:59:49.219: IPv6RT0: ospf 1, Route add 2001::2/128 [new]
*Mar 23 06:59:49.219: IPv6RT0: ospf 1, Add 2001::2/128 to table
*Mar 23 06:59:49.223: IPv6RT0: ospf 1, Adding next-hop FE80::20D:28FF:FE9E:F940
over FastEthernet0/0 for 2001::2/128, [110/1]
*Mar 23 06:59:49.223: IPv6RT0: ospf 1, Reuse backup for 3FFE::/64, distance 110
*Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Route add 2002::2/128 [new]
*Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Add 2002::2/128 to table
*Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Adding next-hop
FE80::20D:28FF:FE9E:F940 over FastEthernet0/0 for 2002::2/128, [120/2]
*Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Route add 2003::2/128 [new]
*Mar 23 06:59:49.223: IPv6RT0: rip TSHOOT, Add 2003::2/128 to table
*Mar 23 06:59:49.227: IPv6RT0: rip TSHOOT, Adding next-hop
FE80::20D:28FF:FE9E:F940 over FastEthernet0/0 for 2003::2/128, [120/2]
*Mar 23 06:59:49.227: IPv6RT0: rip TSHOOT, Reuse backup for 3FFE::/64, distance
120
*Mar 23 06:59:49.227: IPv6RT0: Event: 2001::2/128, Del, owner ospf, previous
None
*Mar 23 06:59:49.227: IPv6RT0: Event: 2002::2/128, Del, owner rip, previous None
*Mar 23 06:59:49.227: IPv6RT0: Event: 2003::2/128, Del, owner rip, previous None
*Mar 23 06:59:49.231: IPv6RT0: Event: 2001::2/128, Add, owner ospf, previous
None
*Mar 23 06:59:49.231: IPv6RT0: Event: 2002::2/128, Add, owner rip, previous None
*Mar 23 06:59:49.231: IPv6RT0: Event: 2003::2/128, Add, owner rip, previous None
*Mar 23 07:00:02.251: IPv6RT0: rip TSHOOT, Delete 2003::2/128 from table
*Mar 23 07:00:02.251: IPv6RT0: rip TSHOOT, Delete backup for 3FFE::/64
*Mar 23 07:00:02.251: IPv6RT0: Event: 2003::2/128, Del, owner rip, previous None
From the debug output above, we can determine that the local router is running OSPFv3 using
process ID 1. The router is also running RIPng, using an instance named TSHOOT. Both OSPFv3
and RIPng are using the default administrative distances of 110 and 120, respectively. We can also
determine that the local router is receiving the 2001::2/128 prefix from another OSPFv3 router with
the Link-Local address FE80::20D:28FF:FE9E:F940. The router is also receiving the 2002::2/128 and
2003::2/128 prefixes via RIPng from another RIPng-enabled router with the Link-Local address
FE80::20D:28FF:FE9E:F940 (i.e., the same router also running OSPFv3). While all three prefixes
are originally inserted into the routing table, we can see that the 2003::2/128 prefix was eventually
removed. You can use this information to determine whether this is what should be happening (e.g.,
maybe it complies with a configured RIPng filter), or troubleshoot the cause if this is not expected
behavior (e.g., maybe the prefix is flapping).
396
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
Verifying and Troubleshoo ng Basic IPv6 Protocol Opera on
The Neighbor Discovery Protocol (NDP) is a core element of IPv6. NDP operates at the Link Layer
and is responsible for discovery of other nodes on the link, determining the Link Layer addresses
of other nodes, finding available routers, and maintaining reachability information about the paths
to other active neighbor nodes. NDP performs functions for IPv6 that are similar to the way ARP
and ICMP Router Discovery and Router Redirect protocols do for IPv4. NDP defines five types of
ICMPv6 packets, which are listed and described in Table 10-1 below:
Table 10-1. ICMPv6 NDP Message Types
ICMPv6 Type
133
134
135
136
137
Message Type Description and IPv6 Usage
Used for Router Solicitation (RS) messages
Used for Router Advertisement (RA) messages
Used for Neighbor Solicitation (NS) messages
Used for Neighbor Advertisement (NA) messages
Router Redirect
While you are not expected to perform any advanced NDP troubleshooting, you can leverage NDP
to troubleshoot basic misconfigurations between devices on the local segment. For example, you
could use the show ipv6 routers [conflict] command to determine the routers that reside on
the local link (same multi-access segment) and verify that their NDP configuration parameters are
the same as the local router as illustrated in the following output:
R1#show ipv6 routers
Router FE80::20F:23FF:FE5E:EC80 on FastEthernet0/0, last update 2 min
Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500
HomeAgentFlag=0, Preference=Medium
Reachable time 0 msec, Retransmit time 0 msec
Prefix 3FFE::/64 onlink autoconfig
Valid lifetime 2592000, preferred lifetime 604800
Router FE80::213:7FFF:FEAF:3E00 on FastEthernet0/0, last update 0 min
Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500
HomeAgentFlag=0, Preference=Medium
Reachable time 0 msec, Retransmit time 0 msec
Prefix 3FFE::/64 onlink autoconfig
Valid lifetime 2592000, preferred lifetime 604800
Router FE80::20D:28FF:FE9E:F940 on FastEthernet0/0, last update 0 min
Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500
HomeAgentFlag=0, Preference=Medium
Reachable time 0 msec, Retransmit time 0 msec
Prefix 3FFE::/64 onlink autoconfig
Valid lifetime 2592000, preferred lifetime 604800
397
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
From the output above, we can determine that there are three additional IPv6 routers on the local segment. All three routers have the same prefix and additional IPv6 configuration parameters.
Using this command, you could troubleshoot misconfigured device issues by appending the conflicts keyword, which prints information on received Router Advertisements that differ from
the advertisements configured for any of the local interfaces, as illustrated in the following output:
R1#show ipv6 routers conflicts
Router FE80::20F:23FF:FE5E:EC80 on FastEthernet0/0, last update 1 min, CONFLICT
Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500
HomeAgentFlag=0, Preference=Medium
Reachable time 0 msec, Retransmit time 0 msec
Prefix 3FFE::/64 onlink autoconfig
Valid lifetime -1, preferred lifetime -1
Router FE80::213:7FFF:FEAF:3E00 on FastEthernet0/0, last update 0 min, CONFLICT
Hops 64, Lifetime 1800 sec, AddrFlag=0, OtherFlag=0, MTU=1500
HomeAgentFlag=0, Preference=Medium
Reachable time 0 msec, Retransmit time 0 msec
Prefix 3FFE::/64 onlink autoconfig
Valid lifetime -1, preferred lifetime -1
Another useful command when troubleshooting basic IPv6 functionality is the show ipv6 traffic command. This command prints information on sent and received IPv6 packets, which can
be used to troubleshoot anything from Layer 1/Layer 2 problems to device misconfigurations, depending on the information printed in the output. Following is a sample output of the information
that is printed by this command:
R1#show ipv6 traffic
IPv6 statistics:
Rcvd: 1710 total, 1710 local destination
0 source-routed, 0 truncated
0 format errors, 0 hop count exceeded
0 bad header, 0 unknown option, 0 bad source
0 unknown protocol, 0 not a router
0 fragments, 0 total reassembled
0 reassembly timeouts, 0 reassembly failures
0 unicast RPF drop, 0 suppressed RPF drop
Sent: 1466 generated, 0 forwarded
0 fragmented into 0 fragments, 0 failed
0 encapsulation failed, 0 no route, 0 too big
Mcast: 1655 received, 1412 sent
ICMP statistics:
Rcvd: 168 input, 0 checksum errors, 0 too short
0 unknown info type, 0 unknown error type
unreach: 0 routing, 0 admin, 0 neighbor, 0 address, 0 port
398
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
parameter: 0 error, 0 header, 0 option
0 hop count expired, 0 reassembly timeout,0 too big
5 echo request, 15 echo reply
0 group query, 0 group report, 0 group reduce
0 router solicit, 123 router advert, 0 redirects
11 neighbor solicit, 14 neighbor advert
Sent: 107 output, 0 rate-limited
unreach: 0 routing, 0 admin, 0 neighbor, 0 address, 0 port
parameter: 0 error, 0 header, 0 option
0 hop count expired, 0 reassembly timeout,0 too big
15 echo request, 5 echo reply
0 group query, 0 group report, 0 group reduce
0 router solicit, 56 router advert, 0 redirects
16 neighbor solicit, 15 neighbor advert
UDP statistics:
Rcvd: 571 input, 0 checksum errors, 0 length errors
0 no port, 0 dropped
Sent: 346 output
TCP statistics:
Rcvd: 0 input, 0 checksum errors
Sent: 0 output, 0 retransmitted
Parsing through the output above, if you notice a large number of errors, it may be due to Layer 1
issues. In such cases, proceed and troubleshoot this Layer using the appropriate commands and
methodology, such as component swapping for example, until the errors are no longer present. As
another example, if the local device is sending NDP messages but shows that it is not receiving any,
and you know that there is at least another device connected to the local segment or link, you would
verify connectivity between the two and check for intermediate device misconfigurations, such
as incorrect VLAN assignments or configurations on a switch. As can be seen, the information
printed by this command can be very useful in helping to identify and isolate a plethora of issues
that may cause IPv6 problems.
UNDERSTANDING AND TROUBLESHOOTING EIGRPV6
EIGRPv6 retains the same basic core functions as EIGRPv4. For example, both versions still use
DUAL to ensure loop-free paths, and both protocols use Multicast packets to send updates – although EIGRPv6 uses IPv6 Multicast address FF02::A instead of the 224.0.0.10 group address used
by EIGRPv4. While the same core fundamentals are retained, there are some differences between
these versions. Table 10-2 below lists the differences between EIGRPv4 and EIGRPv6, or simply
and more commonly between EIGRP for IPv4 and EIGRP for IPv6.
399
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Table 10-2. EIGRPv4 and EIGRPv6 Differences
Protocol Characteristic
EIGRP for IPv4
EIGRP for IPv6
Automatic Summarization
Authentication or Security
Common Subnet for Peers
Advertisement Contents
Packet Encapsulation
Yes
MD5
Yes
Subnet/Mask
IPv4
Not Applicable
Built into IPv6
No
Prefix/Length
IPv6
Given the similarity of EIGRPv4 and EIGRPv6, the troubleshooting approaches for both of these
protocols are also very similar. For example, EIGRPv6 still requires that the same autonomous system number be used in order for a neighbor relationship to be established. In addition, other EIGRP
parameters, such as authentication configuration and K values, should also be the same; otherwise,
EIGRPv6 will not establish a neighbor relationship with peer routers.
However, there are some subtle differences because of the manner in which these two protocols
operate and are configured with which you should be familiar. The first is that while there is no explicit configuration to enable the EIGRPv4 routing process, by default, when EIGRPv6 is enabled,
the protocol defaults to a shutdown state in Cisco IOS software. This default state is displayed when
you issue the show ipv6 eigrp neighbors command as illustrated below:
R1#show ipv6 eigrp neighbors
IPv6-EIGRP neighbors for process 1
% EIGRP 1 is in SHUTDOWN
Alternatively, the default state is also displayed when you parse through the configuration as illustrated in the following output:
R1#show running-config | section eigrp
ipv6 eigrp 1
ipv6 router eigrp 1
shutdown
When configuring EIGRPv6, you must issue the no shutdown router configuration command to
enable the routing process. When the no shutdown command is issued, keep in mind that it is not
included in the configuration.
Another common problem is forgetting to specify a router ID. Unlike EIGRPv4, if there are no interfaces with an IPv4 address, it is mandatory that you specify the router ID for EIGRPv6 using the
router-id <ipv4-address> router configuration command. If EIGRPv6 is enabled on a router
with no interfaces in the up state assigned an IPv4 address, the routing protocol is not enabled
400
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
and the following error message is printed on the console when you issue the show ipv6 eigrp
neighbors command:
R1#show ipv6 eigrp neighbors
IPv6-EIGRP neighbors for process 1
% No router ID for EIGRP 1
While basic WAN interface operation is the same for both EIGRPv4 and EIGRPv6, keep in mind
that EIGRPv6 uses Link-Local addresses for peer relationships and as the next-hop IPv6 address
for received updates. Therefore, when enabling EIGRPv6 over NBMA technologies such as ATM
and Frame Relay, you must configure static mappings using the Link-Local addresses, not the global
Unicast addresses. If you specify global Unicast addresses, while the routing protocol neighbor
relationships will be established, you will not have reachability to remote subnets. Following is a
sample configuration of how to configure static Frame Relay mappings using the Link-Local addresses on the routers:
R1(config)#interface Serial1/1
R1(config-if)#frame-relay map ipv6 FE80::205:5EFF:FE6E:5C80 111 broadcast
R1(config-if)#exit
This configuration can be validated using the show frame-relay map command as follows:
R1#show frame-relay map
Serial1/1 (up): ipv6 FE80::205:5EFF:FE6E:5C80 dlci 111(0x6F,0x18F0), static,
broadcast,
CISCO, status defined, active
A useful EIGRPv6 command is the show ipv6 eigrp traffic command, which provides information on EIGRP packet statistics, such as Hello packets, for example. This command can be used
to troubleshoot EIGRP operational issues. For example, if the local router is sending Hellos but is
not receiving any back, it may be due to Link Layer issues or even EIGRP packet filtering or blocking. Following is a sample output of this command:
R1#show ipv6 eigrp traffic
IPv6-EIGRP Traffic Statistics for AS 1
Hellos sent/received: 409/392
Updates sent/received: 18/19
Queries sent/received: 0/0
Replies sent/received: 0/0
Acks sent/received: 6/0
SIA-Queries sent/received: 0/0
SIA-Replies sent/received: 0/0
Hello Process ID: 225
PDM Process ID: 224
401
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
IPv6 Socket queue: 0/50/1/0 (current/max/highest/drops)
Eigrp input queue: 0/2000/1/0 (current/max/highest/drops)
As is the case with EIGRPv4, the debug eigrp packets command can be used to view real-time
information on EIGRP packets as illustrated in the following output:
R1#debug eigrp packets
EIGRP Packets debugging is on
(UPDATE, REQUEST, QUERY, REPLY, HELLO, IPXSAP, PROBE, ACK, STUB, SIAQUERY,
SIAREPLY)
R1#
R1#
*Mar 1 04:19:27.771: EIGRP: Sending HELLO on Serial1/1
*Mar 1 04:19:27.771:
AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
*Mar 1 04:19:28.519: EIGRP: Received HELLO on Serial1/1 nbr
FE80::202:FDFF:FE06:6350
*Mar 1 04:19:28.519:
AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
peerQ un/rely 0/0
*Mar 1 04:19:32.459: EIGRP: Sending HELLO on Serial1/1
*Mar 1 04:19:32.459:
AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
*Mar 1 04:19:33.231: EIGRP: Received HELLO on Serial1/1 nbr
FE80::202:FDFF:FE06:6350
*Mar 1 04:19:33.231:
AS 1, Flags 0x0, Seq 0/0 idbQ 0/0 iidbQ un/rely 0/0
peerQ un/rely 0/0
While the debug eigrp command can be used to troubleshoot both EIGRPv4 and EIGRPv6, you
should use the debug ipv6 eigrp suite of commands to view EIGRPv6-specific events. This command supports the following options:
R1#debug ipv6 eigrp ?
<1-65535>
Autonomous System
neighbor
EIGRP neighbor debugging
notifications EIGRP event notifications
summary
EIGRP summary route processing
<cr>
Following is a sample output of the debug ipv6 eigrp commands, which prints real-time information on EIGRPv6 route events:
R1#debug ipv6 eigrp 1
IP-EIGRP Route Events debugging is on
R1#clear ipv6 eigrp 1 neighbors
R1#
*Mar 1 04:20:53.111: %DUAL-5-NBRCHANGE: IPv6-EIGRP(0) 1: Neighbor
FE80::202:FDFF:FE06:6350 (Serial1/1) is down: manually cleared
*Mar
1 04:20:56.807: %DUAL-5-NBRCHANGE: IPv6-EIGRP(0) 1: Neighbor
402
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
FE80::202:FDFF:FE06:6350 (Serial1/1) is up: new adjacency
*Mar 1 04:20:56.827: IPv6-EIGRP(0:1): Processing incoming UPDATE packet
*Mar 1 04:20:56.831: IPv6-EIGRP(0:1): 2001::/64 - do advertise out Serial1/1
*Mar 1 04:20:56.831: IPv6-EIGRP(0:1): Int 2001::/64 metric 20512000 - 20000000
512000
*Mar 1 04:20:56.847: IPv6-EIGRP(0:1): Processing incoming UPDATE packet
*Mar 1 04:20:56.851: IPv6-EIGRP(0:1): Int 2001::/64 M 21024000 - 20000000
1024000 SM 2169856 - 1657856 512000
*Mar 1 04:20:56.851: IPv6-EIGRP(0:1): 2001::/64 routing table not updated
*Mar 1 04:20:56.851: IPv6-EIGRP(0:1): 2001::/64 - do advertise out Serial1/1
*Mar 1 04:20:56.851: IPv6-EIGRP(0:1): Int 2001::/64 metric 20512000 - 20000000
512000
*Mar 1 04:20:56.867: IPv6-EIGRP(0:1): Processing incoming UPDATE packet
*Mar 1 04:20:56.871: IPv6-EIGRP(0:1): Int 2001::/64 M 21024000 - 20000000
1024000 SM 2169856 - 1657856 512000
UNDERSTANDING AND TROUBLESHOOTING RIPNG
For the most part, RIPng is very similar to the RIPv2 specification. However, it is important to remember that there are some notable differences with which you should be familiar regarding these
two routing protocols. These similarities and differences are listed in Table 10-3 below:
Table 10-3. RIPv2 and RIPng Similarities and Differences
Protocol Characteristic
RIPv2
RIPng
Protocol Classification
Hop Limitation
Split Horizon
Poison Reverse
Transport Layer Protocol
Multicast Updates
Administrative Distance
Hold-down Timers
Destination Prefix Length
Next-hop Length
Next-hop Address
Transport
UDP Port Number
Authentication
Automatic Summarization
Can Broadcast Updates
Distance Vector
15
Yes
Yes
UDP
Yes (224.0.0.9)
120
Yes
32-bit
32-bit
Primary Interface Address
IPv4
520
Text and MD5
Yes (enabled by default)
Yes
Distance Vector
15
Yes
Yes
UDP
Yes (FF02::9)
120
Yes
128-bit
128-bit
Link-Local Address
IPv6
521
Inbuilt into IPv6 (IPSec)
Not Applicable
Not Applicable
403
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
When troubleshooting RIPng, you should keep in mind that RIPng has the same limitations as
RIPv2. For example, updates with a hop count of 16 (or greater) are considered unreachable. These
routes will not be installed into the routing table.
The primary problems experienced when implementing RIPng are due to misconfigurations, as the
configuration syntax for RIPv2 and RIPng is significantly different in Cisco IOS software. For example, when configuring RIPv2, you can configure the router to advertise a default route in router
configuration mode using the default-information originate router configuration command.
This allows the RIPv2 router to advertise the default route to all other neighbors. With RIPng, however, this functionality is now performed under the interface. Therefore, the default route is advertised only out of that specified interface, which means that multiple instances of this command are
required on a router with multiple RIPng-enabled interfaces and multiple neighbors that should all
be receiving the default route.
Recapping what is described in the ROUTE study guide, Table 10-4 below lists some common configuration commands and how they are applied in RIPv2 and RIPng, respectively:
Table 10-4. RIPv2 and RIPng Cisco IOS Software Configuration Differences
Command Function
RIPv2 Command
RIPng Command
Enable RIP routing
Use the router rip global
configuration command
Use the ipv6 router rip
[tag] global configuration
command
Advertise networks or prefixes
using RIP
Use the network router
configuration command
Use the ipv6 rip [tag]
enable interface configuration
Generate a RIP default route
Use the default-
command
Use the ipv6 rip [tag]
information originate
router configuration command
Enable or disable Split Horizon
Verify received RIP routing
information
Verify the RIP database
Use the [no] ip splithorizon interface
configuration command
Use the show ip route
[rip] command
Use the show ip rip
database command
default-information
[originate | only]
interface configuration
command
Use the [no] split-horizon
router configuration command
Use the show ipv6 route
[rip] command
Use the show ipv6 rip
[tag] database command
As previously stated, commonly experienced problems with RIPng are due to misconfiguration on
the devices running RIPng. Commonly encountered issues include the following:
404
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
•
Incorrect default route advertisement configuration causes connectivity issues
•
The router is not receiving routes
•
The router is advertising routes it should not be advertising
•
No routes are installed into the RIB
RIPng uses the ipv6 rip [tag] default-information [originate | only] interface configuration command to advertise a default route. By default, when this command is specified, the
router will advertise the default route even if one is not present in the routing table. In most cases,
when advertising a default route to downstream neighbors, there is typically no need to advertise
any other specific routes. However, there are instances when you might want to advertise both the
default and some other more specific routes. For example, consider the topology that is illustrated
in Figure 10-1 below:
ISP
ISP
MP-BGP
FC00::2/128
MP-BGP
R2
R3
Fa0/0
FC00::3/128
Fa0/0
RIPng
Fa0/0.22
Fa0/0.33
R1
Fig. 10-1. RIPng Default Routing Issues
Referencing Figure 10-1, routers R1, R2, and R3 are running RIPng. In addition, routers R2 and R3
are peered to two ISPs and are running MP-BGP. These routers are receiving multiple routes from
the ISP. Therefore, rather than redistribute these into RIPng, administrators have decided instead
to configure R2 and R3 to send R1 a default route. Based on this solution, the current routing table
on R1 displays the following entries:
R1#show ipv6 route
IPv6 Routing Table - 3 entries
Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP
405
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R
L
L
U - Per-user Static route
I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary
O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2
ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2
::/0 [120/2]
via FE80::213:19FF:FE86:CA20, FastEthernet0/0.22
via FE80::201:96FF:FE1B:DB80, FastEthernet0/0.33
FE80::/10 [0/0]
via ::, Null0
FF00::/8 [0/0]
via ::, Null0
While this configuration works well for Internet-based traffic, assuming both R2 and R3 have the
same Internet routes, it does introduce intermediate connectivity issues from R1 to the Loopback0
subnets configured on routers R2 and R3 as illustrated in the following ping outputs:
R1#ping fc00::2 repeat 10 source FastEthernet0/0 verbose
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to FC00::2, timeout is 2 seconds:
Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0
Reply to request 0 (0 ms)
Request 1 received unknown echo response type U
Reply to request 2 (0 ms)
Request 3 received unknown echo response type U
Reply to request 4 (4 ms)
Request 5 received unknown echo response type U
Reply to request 6 (0 ms)
Request 7 received unknown echo response type U
Reply to request 8 (0 ms)
Request 9 received unknown echo response type U
Success rate is 50 percent (5/10), round-trip min/avg/max = 0/0/4 ms
R1#ping fc00::3 repeat 10 source FastEthernet0/0 verbose
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to FC00::3, timeout is 2 seconds:
Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0
Reply to request 0 (4 ms)
Request 1 received unknown echo response type U
Reply to request 2 (0 ms)
Request 3 received unknown echo response type U
Reply to request 4 (0 ms)
Request 5 received unknown echo response type U
Reply to request 6 (4 ms)
Request 7 received unknown echo response type U
Reply to request 8 (0 ms)
Request 9 received unknown echo response type U
Success rate is 50 percent (5/10), round-trip min/avg/max = 0/1/4 ms
The reason for this issue is that one packet is sent to R2 while another is sent to R3. The packet sent
to the router on which the /128 address is configured will be responded to; however, the packet sent
406
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
to the router on which the /128 address is not configured will time out. This is because the local
router is load-balancing across the equal-cost path (i.e., the 0.0.0.0 route that is received from both
routers R2 and R3.
As you troubleshoot, you verify the configurations of R2 and R3 and notice that their respective
FastEthernet0/0 interfaces have been configured with the ipv6 rip TSHOOT default-information only interface configuration command as is illustrated below on router R2:
R2#show running-config interface FastEthernet0/0
Building configuration...
Current configuration : 168 bytes
!
interface FastEthernet0/0
duplex auto
speed auto
ipv6 enable
ipv6 rip TSHOOT enable
ipv6 rip TSHOOT default-information only
end
While this configuration allows RIPng to advertise the default route, it also suppresses all other
specific routes, leading to the intermittent connectivity issue from R1 to the R2 and R3 Loopback0
subnets illustrated above. The recommended solution in this case is to allow the router to advertise
the other routes in addition to the default route. The reason this is commonly an issue is because
the same behavior is not applicable when using RIPv2. In other words, with RIPv2, generating the
default route does not suppress all other routes. This is yet another difference in protocol operation
in Cisco IOS software that can cause problems. The solution is implemented on routers R2 and R3
as follows:
R2(config)#interface FastEthernet0/0
R2(config-if)#ipv6 rip TSHOOT default-information originate
R2(config-if)#exit
R3(config)#interface FastEthernet0/0
R3(config-if)#ipv6 rip TSHOOT default-information originate
R3(config-if)#exit
Following this configuration, the routing table on R1 displays the following entries:
R1#show ipv6 route
IPv6 Routing Table - 6 entries
Codes: C - Connected, L - Local, S - Static, R - RIP, B - BGP
U - Per-user Static route
I1 - ISIS L1, I2 - ISIS L2, IA - ISIS interarea, IS - ISIS summary
O - OSPF intra, OI - OSPF inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2
407
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2
R
::/0 [120/2]
via FE80::213:19FF:FE86:CA20, FastEthernet0/0.22
via FE80::201:96FF:FE1B:DB80, FastEthernet0/0.33
R
FC00::2/128 [120/2]
via FE80::213:19FF:FE86:CA20, FastEthernet0/0.22
R
FC00::3/128 [120/2]
via FE80::201:96FF:FE1B:DB80, FastEthernet0/0.33
L
FE80::/10 [0/0]
via ::, Null0
FF00::/8 [0/0]
via ::, Null0
L
Following this reconfiguration on R2 and R3, the same ping tests performed on R1 earlier, which
had a 50% success rate, are now completely successful as illustrated below:
R1#ping fc00::2 repeat 10 source FastEthernet0/0 verbose
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to FC00::2, timeout is 2 seconds:
Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0
Reply to request 0 (0 ms)
Reply to request 1 (4 ms)
Reply to request 2 (0 ms)
Reply to request 3 (0 ms)
Reply to request 4 (4 ms)
Reply to request 5 (0 ms)
Reply to request 6 (4 ms)
Reply to request 7 (0 ms)
Reply to request 8 (0 ms)
Reply to request 9 (4 ms)
Success rate is 100 percent (10/10), round-trip min/avg/max = 0/1/4 ms
R1#ping fc00::3 repeat 10 source FastEthernet0/0 verbose
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to FC00::3, timeout is 2 seconds:
Packet sent with a source address of FE80::20C:CEFF:FEA7:F3A0
Reply to request 0 (4 ms)
Reply to request 1 (0 ms)
Reply to request 2 (0 ms)
Reply to request 3 (0 ms)
Reply to request 4 (4 ms)
Reply to request 5 (0 ms)
Reply to request 6 (0 ms)
Reply to request 7 (4 ms)
Reply to request 8 (4 ms)
408
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
Reply to request 9 (0 ms)
Success rate is 100 percent (10/10), round-trip min/avg/max = 0/1/4 ms
The most common cause for the router not to receive any routes is due to misconfiguration. This
may be due to route filtering misconfiguration (i.e., incorrectly configured distribute list filters),
failing to enable RIPng under the correct interfaces, or even due to the incorrect use of some commands, such as the default-information command, as was illustrated in the previous example.
When troubleshooting such issues, verify device configurations. If any route filters are being used
and have been applied, ensure that they are permitting the correct networks.
In the event that the router is not advertising prefixes that it should be, verify that the interfaces are
in an up/up state on the local router. Additionally, because RIPng does not use network statements,
verify that the RIPng process has been enabled under the interface(s). You can use the show ipv6
rip command to determine which RIPng process is enabled under which interfaces, among other
things. Following is a sample output of the information that can be garnered from this command:
R1#show ipv6 rip
RIP process “TSHOOT”, port 521, multicast-group FF02::9, pid 231
Administrative distance is 120. Maximum paths is 16
Updates every 30 seconds, expire after 180
Holddown lasts 0 seconds, garbage collect after 120
Split horizon is on; poison reverse is off
Default routes are not generated
Periodic updates 352, trigger updates 46
Interfaces:
FastEthernet0/0.22
Serial0/0
Redistribution:
None
RIP process “CCNP”, port 521, multicast-group FF02::9, pid 232
Administrative distance is 120. Maximum paths is 16
Updates every 30 seconds, expire after 180
Holddown lasts 0 seconds, garbage collect after 120
Split horizon is on; poison reverse is off
Default routes are not generated
Periodic updates 23, trigger updates 12
Interfaces:
FastEthernet0/0.22
FastEthernet0/0.33
Redistribution:
None
As illustrated in the output above, it is possible for multiple RIPng instances to be configured under
the same interface. When implementing RIPng, if you specify one process under the interface and
then specify another, the interface belongs to both processes, or instances. Cisco IOS software does
not overwrite the previous process with the new one.
409
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
There are several reasons why the router may not install routes into the RIB. Common causes include route filtering and routes received with a metric of ‘unreachable’ (i.e., routes received with a
hop count of 16 or greater). In addition to basic show commands, you can also troubleshoot RIPng
problems using the debug ip rip command. This command provides detailed information on
RIPng received and sent updates as illustrated in the output below:
R1#debug ipv6 rip
RIP Routing Protocol debugging is on
R1#
*Oct 20 11:12:55.420: RIPng: Suppressed null multicast update on
FastEthernet0/0.22 for TSHOOT
*Oct 20 11:12:56.590: RIPng: response received from FE80::213:19FF:FE86:A20 on
FastEthernet0/0.22 for TSHOOT
*Oct 20 11:12:56.590:
src=FE80::213:19FF:FE86:A20 (FastEthernet0/0.22)
*Oct 20 11:12:56.590:
dst=FF02::9
*Oct 20 11:12:56.590:
sport=521, dport=521, length=92
*Oct 20 11:12:56.590:
command=2, version=1, mbz=0, #rte=4
*Oct 20 11:12:56.590:
tag=0, metric=1, prefix=2004::20/128
*Oct 20 11:12:56.590:
tag=0, metric=1, prefix=FC00::10/128
*Oct 20 11:12:56.590:
tag=0, metric=1, prefix=FC00::20/128
*Oct 20 11:12:56.590:
tag=0, metric=1, prefix=FC00::30/128
*Oct 20 11:12:56.590: RIPng: Added neighbor FE80::213:19FF:FE86:A20/
FastEthernet0/0.22
*Oct 20 11:12:56.590: RIPng: Inserted 2004::20/128, nexthop
FE80::213:19FF:FE86:A20, metric 12, tag 0
*Oct 20 11:12:56.594: RIPng: Inserted FC00::10/128, nexthop
FE80::213:19FF:FE86:A20, metric 12, tag 0
*Oct 20 11:12:56.594: RIPng: Inserted FC00::20/128, nexthop
FE80::213:19FF:FE86:A20, metric 12, tag 0
*Oct 20 11:12:56.594: RIPng: Inserted FC00::30/128, nexthop
FE80::213:19FF:FE86:A20, metric 12, tag 0
*Oct 20 11:12:56.594: RIPng: Triggered update requested, in hold-down
*Oct 20 11:13:00.421: RIPng: generating triggered update for TSHOOT
*Oct 20 11:13:00.421: RIPng: Suppressed null multicast update on
FastEthernet0/0.22 for TSHOOT
*Oct 20 11:13:07.360: RIPng: Next RIB walk in 169230
*Oct 20 11:13:11.992: RIPng: Process TSHOOT received response for CCNP on
FastEthernet0/0.33
*Oct 20 11:13:11.992: RIPng: response received from FE80::201:96FF:FE1B:DB80 on
FastEthernet0/0.33 for CCNP
*Oct 20 11:13:11.992:
src=FE80::201:96FF:FE1B:DB80 (FastEthernet0/0.33)
*Oct 20 11:13:11.992:
dst=FF02::9
*Oct 20 11:13:11.992:
sport=521, dport=521, length=92
*Oct 20 11:13:11.992:
command=2, version=1, mbz=0, #rte=4
*Oct 20 11:13:11.992:
tag=0, metric=4, prefix=FC00::1/128
*Oct 20 11:13:11.992:
tag=0, metric=4, prefix=FC00::2/128
*Oct 20 11:13:11.992:
tag=0, metric=4, prefix=FC00::3/128
*Oct 20 11:13:11.992:
tag=0, metric=4, prefix=FC00::4/128
*Oct 20 11:13:11.996: RIPng: Added neighbor FE80::201:96FF:FE1B:DB80/
FastEthernet0/0.33
410
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
*Oct 20 11:13:11.996: RIPng: Inserted FC00::1/128, nexthop
FE80::201:96FF:FE1B:DB80, metric 5, tag 0
*Oct 20 11:13:11.996: RIPng: RIPv6 ager started, 180000
*Oct 20 11:13:11.996: RIPng: Inserted FC00::2/128, nexthop
FE80::201:96FF:FE1B:DB80, metric 5, tag 0
*Oct 20 11:13:11.996: RIPng: Inserted FC00::3/128, nexthop
FE80::201:96FF:FE1B:DB80, metric 5, tag 0
*Oct 20 11:13:11.996: RIPng: Inserted FC00::4/128, nexthop
FE80::201:96FF:FE1B:DB80, metric 5, tag 0
*Oct 20 11:13:12.000: RIPng: Triggered update requested
*Oct 20 11:13:13.002: RIPng: generating triggered update for CCNP
Referencing the debugging output above, we can determine that the local router is running two
RIPng instances, or processes: one named TSHOOT and the other named CCNP. Instance
TSHOOT receives four prefixes from neighbor FE80::213:19FF:FE86:A20, via FastEthernet0/0.22.
These prefixes have a route metric of 1, indicating that they originated on that router. However, the
prefixes are installed into the RIB with a route metric of 11. This means that the original metric
is being offset using the ipv6 rip TSHOOT metric-offset 11 command under the FastEthernet0/0.22 interface.
The local router is also receiving four prefixes from neighbor FE80::201:96FF:FE1B:DB80 via
FastEthernet0/0.33. These prefixes are received with a route metric of 4. This is an indication that
the prefixes are not local to the router (i.e., are not directly connected), or they were redistributed
into RIPng on the local router and the metric was specified during redistribution. Because the
metric is incremented by 1 when the routes are installed into the RIB, we can conclude that no
non-default metric offsetting is configured under the FastEthernet0/0.33 interface on the router.
UNDERSTANDING AND TROUBLESHOOTING OSPFV3
OSPFv3 is defined in RFC 2740 and is the counterpart of OSPFv2, but is designed explicitly for the
IPv6 routed protocol. The similarities shared by OSPFv2 and OSPFv3 are as follows:
•
OSPFv3 continues to use the same packets that are also used by OSPFv2
•
Neighbor discovery and the adjacency formation process are the same
•
OSPFv3 still maintains RFC-compliant on different technologies
•
Both OSPFv2 and OSPFv3 use the same LSA flooding and aging mechanisms
•
Like OSPFv2, the OSPFv3 router ID requires the use of a 32-bit IPv4 address
•
Like OSPFv2, the OSPFv3 link ID is based on a 32-bit IPv4 address
While there are similarities between OSPFv2 and OSPFv3, it is important to understand that some
significant differences exist with which you must be familiar. These include the following:
411
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
OSPFv3 uses IPv6 Link-Local addresses (not global addresses) to identify the OSPFv3
adjacencies
•
OSPFv3 introduces two new OSPF LSA types, which are Type 8 and Type 9 LSAs
•
OSPFv3 messages are sent over (encapsulated in) IPv6packets, not IPv4 packets
•
OSPFv3 uses standard IPv6 Multicast addresses FF02::5 and FF02::6
•
OSPFv3 leverages the inbuilt capabilities of IPSec for security and authentication
•
The Options field in Hello and DBD packets include the R-bit and the V6-bit
•
The OSPFv3 Hello packet contains no address information, but includes an interface ID
NOTE: Additional detailed information on these differences and similarities can be found in
the current ROUTE study guide, which is available online.
For all intents and purposes, and barring configuration differences, the same methods used to
troubleshoot OSPFv2 are applicable when troubleshooting OSPFv3 because the protocols operate
in the same manner. When troubleshooting OSPFv2, Cisco IOS software OSPF commands begin
with show ip ospf. In a similar manner, the OSPFv3 show commands begin with show ipv6 ospf.
This command supports the following keywords:
R1#show ipv6 ospf ?
<1-65535>
border-routers
database
flood-list
interface
neighbor
request-list
retransmission-list
summary-prefix
traffic
virtual-links
|
<cr>
Process ID number
Border and Boundary Router Information
Database summary
Link State flood list
Interface information
Neighbor list
Link State request list
Link State retransmission list
Summary-prefix redistribution Information
OSPF traffic information
Virtual link information
Output modifiers
NOTE: The majority of the supported keywords are described in greater detail in the ROUTE
study guide, which is available online. Please refer to that guide for additional information on
the keywords that are not described in this section. It should also be noted that keywords that
are beyond the scope of ROUTE and TSHOOT exam requirements are not discussed in either
study guide.
A commonly used troubleshooting command is the show ipv6 ospf <process> command. This
command can be used to verify router configuration (e.g., ABR and ASBR configuration), verify the
areas configured, and the types (e.g., stub, NSSA, etc.). Additionally, you can use this command to
412
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
determine the number of times the SPF algorithm has been run in certain areas, which is useful for
troubleshooting issues such as link route flapping, for example. The following output displays the
information that is provided by this command:
R1#show ipv6 ospf
Routing Process “ospfv3 1” with ID 1.1.1.1
It is an area border and autonomous system boundary router
Redistributing External Routes from,
rip
rip
SPF schedule delay 5 secs, Hold time between two SPFs 10 secs
Minimum LSA interval 5 secs. Minimum LSA arrival 1 secs
LSA group pacing timer 240 secs
Interface flood pacing timer 33 msecs
Retransmission pacing timer 66 msecs
Number of external LSA 8. Checksum Sum 0x03D2E5
Number of areas in this router is 2. 2 normal 0 stub 0 nssa
Reference bandwidth unit is 100 mbps
Area BACKBONE(0)
Number of interfaces in this area is 1
SPF algorithm executed 4 times
Number of LSA 5. Checksum Sum 0x0309A8
Number of DCbitless LSA 0
Number of indication LSA 0
Number of DoNotAge LSA 0
Flood list length 0
Area 1
Number of interfaces in this area is 1
SPF algorithm executed 3 times
Number of LSA 2. Checksum Sum 0x01412A
Number of DCbitless LSA 0
Number of indication LSA 0
Number of DoNotAge LSA 0
Flood list length 0
From the output printed above, we can determine the RID of the local router as well as the local
OSPF process. Additionally, we can also determine that the router is an ABR and an ASBR, which
is redistributing two different RIPng instances. By default, Cisco IOS will list the protocol name,
one for each instance that is redistributed into OSPF. For example, if the local router was redistributing three different RIPng instances, the word ‘rip’ listed under the redistribution column would
be printed three times. Additionally, we can also determine the number of external LSAs as well
as the default reference bandwidth. Finally, the different areas, the number of interfaces in those
areas, and the number of times the SPF algorithm has been executed for each individual area is also
included in the output that is printed by this command.
The border-routers keyword allows you to display internal OSPF routing table entries to an ABR
and an ASBR. As is the case with OSPFv2, this command is useful when troubleshooting connec-
413
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
tivity issues to other areas in multi-area OSPF implementations because it can be used to confirm
that the local router has a path to the ABR and the ASBR (if applicable) as illustrated below:
R1#show ipv6 ospf border-routers
OSPFv3 Process 1 internal Routing Table
Codes: i - Intra-area route, I - Inter-area route
i 2.2.2.2 [1] via FE80::2222, FastEthernet0/0.22, ABR, Area 0, SPF 7
i 3.3.3.3 [1] via FE80::3333, FastEthernet0/0.33, ABR/ASBR, Area 0, SPF 8
In addition to verifying routes to ABRs and ASBRs, the show ipv6 ospf border-routers
command can also be used to determine whether the SPF calculation is functional because it also
includes the internal number of the SPF calculation that installed the route.
As is the case with OSPFv3, you can append the database keyword to view the contents of the
LSDB. When viewing the OSPFv3 LSDB, keep in mind that this updated version of the OSPF routing protocol includes two new LSAs, which are Type 8 (Link LSA) and Type 9 (Intra-Area-Prefix
LSA). The Link LSA provides the router’s Link-Local address and provides all the IPv6 prefixes attached to the link. There is one Link LSA per link; however, there can be multiple Intra-Area-Prefix
LSAs with different Link-State IDs. The Area flooding scope can therefore be an associated prefix
with the transit network referencing a Network LSA or an associated prefix with a router or stub
referencing a Router LSA. Following is a sample of the OSPFv3 LSBD:
R1#show ipv6 ospf database
OSPFv3 Router with ID (1.1.1.1) (Process ID 1)
Router Link States (Area 0)
ADV Router
1.1.1.1
2.2.2.2
Age
972
1940
Seq#
0x8000000D
0x8000000B
Fragment ID
0
0
Link Count
1
1
Net Link States (Area 0)
ADV Router
2.2.2.2
Age
940
Seq#
0x80000009
Link ID
4
Rtr Count
2
Inter Area Prefix Link States (Area 0)
ADV Router
2.2.2.2
2.2.2.2
2.2.2.2
Age
1940
1940
1940
Seq#
0x80000002
0x80000002
0x80000002
Prefix
FC00::10/128
FC00::20/128
FC00::30/128
414
Bits
EB
B
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
Link (Type-8) Link States (Area 0)
ADV Router
1.1.1.1
2.2.2.2
Age
973
941
Seq#
0x80000009
0x80000009
Link ID
9
4
Interface
Fa0/0.22
Fa0/0.22
Router Link States (Area 1)
ADV Router
1.1.1.1
Age
973
Seq#
0x8000000C
Fragment ID
0
Link Count
0
Bits
EB
Inter Area Prefix Link States (Area 1)
ADV Router
1.1.1.1
1.1.1.1
1.1.1.1
Age
1981
1981
1981
Seq#
0x80000002
0x80000002
0x80000002
Prefix
FC00::10/128
FC00::20/128
FC00::30/128
Link (Type-8) Link States (Area 1)
ADV Router
1.1.1.1
Age
973
Seq#
0x80000009
Link ID
10
Interface
Fa0/0.33
Type-5 AS External Link States
ADV Router
1.1.1.1
1.1.1.1
1.1.1.1
1.1.1.1
1.1.1.1
Age
973
975
975
975
975
Seq#
0x80000009
0x80000009
0x80000009
0x80000009
0x80000009
Prefix
2004::2/128
FC00::1/128
FC00::2/128
FC00::3/128
FC00::4/128
Finally, the show ipv6 ospf neighbor [detail] command is still a useful command when troubleshooting and verifying neighbor adjacencies. When using this command, keep in mind that even
though it is applicable to OSPFv3, the command output will include IPv4 router IDs, since these
are also used by OSPFv3. Following is a sample output of the information that is printed by this
command:
R2#show ipv6 ospf neighbor detail
Neighbor 1.1.1.1
In the area 0 via interface FastEthernet0/0
Neighbor: interface-id 9, link-local address FE80::20C:CEFF:FEA7:F3A0
Neighbor priority is 1, State is FULL, 6 state changes
DR is 2.2.2.2 BDR is 1.1.1.1
Options is 0x000013 in Hello (V6-Bit E-Bit R-bit )
Options is 0x000013 in DBD (V6-Bit E-Bit R-bit )
Dead timer due in 00:00:34
Neighbor is up for 00:01:30
Index 1/1/1, retransmission queue length 0, number of retransmission 0
415
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
In addition to show commands, you can also use the clear ipv6 ospf suite of commands to
troubleshoot OSPFv3. The options that are available with this command are listed below:
R1#clear ipv6 ospf ?
<1-65535>
Process ID number
counters
OSPF counters
force-spf
Run SPF for OSPF process
process
Reset OSPF process
redistribution Clear OSPF route redistribution
When using the clear ipv6 ospf suite of commands, you can specify the process number to
which you want the relevant keyword applied. This is useful when you are performing OSPFv3
troubleshooting on a router running multiple processes and you do not want to impact any of the
other processes running on the router.
The counters keyword clears state change counters for the OSPFv3 neighbor(s) on a specified interface (if one is specified in conjunction with this command) for the specified OSPFv3 neighbor (if
the neighbor ID is included) or for all of the OSPFv3 neighbors. The example that follows illustrates
how to first verify, clear, or reset, and then verify again the state change counters for neighbors of a
specified interface:
R1#show ipv6 ospf neighbor FastEthernet0/0.22 detail
Neighbor 2.2.2.2
In the area 0 via interface FastEthernet0/0.22
Neighbor: interface-id 4, link-local address FE80::2222
Neighbor priority is 1, State is FULL, 6 state changes
DR is 2.2.2.2 BDR is 1.1.1.1
Options is 0x000013 in Hello (V6-Bit E-Bit R-bit)
Options is 0x000013 in DBD (V6-Bit E-Bit R-bit)
Dead timer due in 00:00:31
Neighbor is up for 00:01:38
Index 1/1/1, retransmission queue length 0, number of retransmission 3
First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0)
Last retransmission scan length is 5, maximum is 5
Last retransmission scan time is 0 msec, maximum is 0 msec
R1#
R1#clear ipv6 ospf counters neighbor FastEthernet0/0.22
R1#
R1# show ipv6 ospf neighbor FastEthernet0/0.22 detail
Neighbor 2.2.2.2
In the Area 0 via interface FastEthernet0/0.22
Neighbor: interface-id 4, link-local address FE80::2222
Neighbor priority is 1, State is FULL, 0 state changes
416
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
DR is 2.2.2.2 BDR is 1.1.1.1
Options is 0x000013 in Hello (V6-Bit E-Bit R-bit)
Options is 0x000013 in DBD (V6-Bit E-Bit R-bit)
Dead timer due in 00:00:38
Neighbor is up for 00:02:12
Index 1/1/1, retransmission queue length 0, number of retransmission 3
First 0x0(0)/0x0(0)/0x0(0) Next 0x0(0)/0x0(0)/0x0(0)
Last retransmission scan length is 5, maximum is 5
Last retransmission scan time is 0 msec, maximum is 0 msec
Clearing counters is useful when troubleshooting adjacency flaps, as it provides some indication
as to how frequently they are occurring. You can then correlate this information with other events,
such as syslog messages or periods of high CPU utilization, for example, when you are troubleshooting OSPFv3 problems.
The force-spf keyword simply runs the SPF algorithm again. The main difference between the
clear ipv6 ospf force-spf command and the clear ipv6 ospf process command is that
the clear ipv6 ospf process command will restart the OSPF process, clear the OSPFv3 database, repopulate the database, and then run the SPF algorithm. This difference in operation can be
validated by enabling OSPFv3 debugging and comparing the difference in output when either command is issued. The following output illustrates the events that occur when the clear ipv6 ospf
force-spf command is issued on the router:
R1#debug ipv6 ospf events
OSPFv3 events debugging is on
R1#debug ipv6 ospf spf
OSPFv3 spf intra events debugging is on
OSPFv3 spf inter events debugging is on
OSPFv3 spf external events debugging is on
R1#clear ipv6 ospf force-spf
R1#
*Oct 20 17:44:30.671: OSPFv3: running SPF for Area 1, cause R N SN SA L
*Oct 20 17:44:30.671: OSPFv3: Intra-Area SPF (Full), Area 1
*Oct 20 17:44:30.671:
Router LSA 1.1.1.1/0, 0 links
*Oct 20 17:44:30.671: OSPFv3: Process Prefix LSAs
*Oct 20 17:44:30.671: OSPFv3: Check VLs
*Oct 20 17:44:30.671: OSPFv3: running SPF for Area 0, cause R N SN SA L
*Oct 20 17:44:30.675: OSPFv3: Intra-Area SPF (Full), Area 0
*Oct 20 17:44:30.675:
Router LSA 1.1.1.1/0, 1 links
*Oct 20 17:44:30.675:
Link 0, int 9, nbr 2.2.2.2, nbr int 4, type 2, cost 1
*Oct 20 17:44:30.675:
Add better path, link 9/4, dist 1
*Oct 20 17:44:30.675: OSPFv3: putting LSA on the clist LSID 0.0.0.4, Type
0x2002, Adv Rtr. 2.2.2.2
*Oct 20 17:44:30.675:
Add path FastEthernet0/0.22/::, distance 1
[Truncated Output]
417
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
type
*Oct
*Oct
type
*Oct
*Oct
type
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
2
20
20
2
20
20
2
17:44:30.679: OSPFv3: Inter-Area SPF, Area 0
17:44:30.683:
IAP LSA 2.2.2.2/0, age 330, seq 0x80000005 (Area 0)
17:44:30.683:
prefix FC00::10/128
17:44:30.683:
adding path FastEthernet0/0.22/FE80::2222
17:44:30.683:
IAP LSA 2.2.2.2/1, age 330, seq 0x80000005 (Area 0)
17:44:30.683:
prefix FC00::20/128
17:44:30.683:
adding path FastEthernet0/0.22/FE80::2222
17:44:30.683:
IAP LSA 2.2.2.2/2, age 330, seq 0x80000005 (Area 0)
17:44:30.683:
prefix FC00::30/128
17:44:30.683:
adding path FastEthernet0/0.22/FE80::2222
17:44:30.683:
Adding deferred prefixes, Area 0
17:44:30.683:
prefix FC00::30/128
17:44:30.687:
send IAP FC00::30/128, metric 1 to Area 1
17:44:30.687:
prefix FC00::20/128
17:44:30.687:
send IAP FC00::20/128, metric 1 to Area 1
17:44:30.687:
prefix FC00::10/128
17:44:30.687:
send IAP FC00::10/128, metric 1 to Area 1
17:44:30.687: OSPFv3: External SPF Type 4005
17:44:30.687:
ASE LSA 2.2.2.2/0, age 330, seq 0x80000003, metric 20,
17:44:30.687:
17:44:30.687:
adding path FastEthernet0/0.22/FE80::2222
ASE LSA 2.2.2.2/1, age 330, seq 0x80000003, metric 20,
17:44:30.687:
17:44:30.687:
adding path FastEthernet0/0.22/FE80::2222
ASE LSA 2.2.2.2/2, age 330, seq 0x80000003, metric 20,
...
[Truncated Output]
Using the same debugging commands, the following illustrates the impact of issuing the clear
ipv6 ospf process command on the same router:
R1#debug ipv6 ospf events
OSPFv3 events debugging is on
R1#debug ipv6 ospf spf
OSPFv3 spf intra events debugging is on
OSPFv3 spf inter events debugging is on
OSPFv3 spf external events debugging is on
R1#clear ipv6 ospf process
Reset ALL OSPF processes? [no]: yes
R1#
*Oct 20 17:51:00.711: OSPFv3: Flushing External Links
*Oct 20 17:51:00.711:
Insert LSA 3 adv_rtr 1.1.1.1,
*Oct 20 17:51:00.711:
Insert LSA 4 adv_rtr 1.1.1.1,
*Oct 20 17:51:00.715:
Insert LSA 5 adv_rtr 1.1.1.1,
*Oct 20 17:51:00.715:
Insert LSA 6 adv_rtr 1.1.1.1,
*Oct 20 17:51:00.715:
Insert LSA 7 adv_rtr 1.1.1.1,
*Oct 20 17:51:00.787: OSPFv3: Flushing Link States in
418
type
type
type
type
type
Area
0x4005
0x4005
0x4005
0x4005
0x4005
0
in
in
in
in
in
maxage
maxage
maxage
maxage
maxage
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
*Oct 20 17:51:00.787:
*Oct 20 17:51:00.787:
Insert LSA 0 adv_rtr 1.1.1.1, type 0x2001 in maxage
Insert LSA 9 adv_rtr 1.1.1.1, type 0x8 in maxage
[Truncated Output]
*Oct 20 11:51:00.859 CST: %OSPFv3-5-ADJCHG: Process 1, Nbr 2.2.2.2 on
FastEthernet0/0.22 from FULL to DOWN, Neighbor Down: Interface down or detached
[Truncated Output]
*Oct 20 17:51:00.951: OSPFv3: DR/BDR election on FastEthernet0/0.22
*Oct 20 17:51:00.951: OSPFv3: Elect BDR 1.1.1.1
*Oct 20 17:51:00.951: OSPFv3: Elect DR 2.2.2.2
[Truncated Output]
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
*Oct
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
17:51:00.955:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.959:
17:51:00.963:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPF:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
OSPFv3:
running SPF for Area 1, cause
malloc clist, size 0, address
Intra-Area SPF (Full), Area 1
No own Router LSA
Check VLs
running SPF for Area 0, cause
malloc clist, size 0, address
Intra-Area SPF (Full), Area 0
No own Router LSA
Check VLs
ospf_gen_asbr_sum_all_areas
Inter-Area SPF, Area 0
Inter-Area SPF, Area 1
External SPF Type 4005
External SPF Type 2007
External SPF Type 2007
R N SN SA L
859C4988
R N SN SA L
859C4988
[Truncated Output]
*Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/0 type 2003
*Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/1 type 2003
*Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/2 type 2003
*Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/0 type 4005
*Oct 20 17:51:00.979: OSPFv3: Schedule partial SPF - 2.2.2.2/1 type 4005
*Oct 20 17:51:00.983: OSPFv3: Schedule partial SPF - 2.2.2.2/2 type 4005
*Oct 20 17:51:00.983: OSPFv3: Schedule partial SPF - 2.2.2.2/3 type 4005
*Oct 20 17:51:00.983: OSPFv3: Synchronized with 2.2.2.2 on FastEthernet0/0.22,
state FULL
*Oct 20 11:51:00.983 CST: %OSPFv3-5-ADJCHG: Process 1, Nbr 2.2.2.2 on
FastEthernet0/0.22 from LOADING to FULL, Loading Done
*Oct 20 17:51:00.983: OSPFv3: Service partial SPF Type3/4:3 Type5:4 Type7:0
*Oct 20 17:51:00.983: OSPFv3: Partial IAP SPF, area 0, Prefix FC00::10/128
...
[Truncated Output]
419
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In essence, the clear ipv6 ospf force-spf command has less impact on OSPFv3 than the clear ipv6
ospf process command because it starts the SPF algorithm without clearing the OSPF database.
Clearing the database on a core router in the network can have a drastic impact on the network.
For this reason, the clear ipv6 ospf process command should be used with caution. The same is
also applicable to the clear ip ospf process command. Therefore, consider using the clear ip ospf
force-spf command instead.
Finally, the redistribution keyword clears OSPFv3 redistribution. This is used when you are troubleshooting route redistribution into OSPF from other routing protocols. This command flushes and reinstalls all external LSAs; however, when this command is issued, the SPF algorithm is not run again.
In conclusion, akin to using the debug ip ospf command to view real-time OSPFv2 events, you
can use the debug ipv6 ospf command to view real-time OSPFv3 events. This command supports the same options as the debug ip ospf command. While additional OSPFv3-specific keywords, such as ipsec (for IPSec event debugging) are included, most of these options are similar to
those used with OSPFv2 and perform the same function. The following shows the available options
that can be included when you are debugging OSPFv3:
R1#debug ipv6 ospf ?
adj
OSPF
database-timer OSPF
events
OSPF
flood
OSPF
hello
OSPF
ipsec
OSPF
lsa-generation OSPF
lsdb
OSPF
packet
OSPF
retransmission OSPF
spf
OSPF
adjacency events
database timer
events
flooding
hello events
ipsec events
lsa generation
database modifications
packets
retransmission events
spf
TROUBLESHOOTING IPV6 ROUTE REDISTRIBUTION
When troubleshooting IPv6 route redistribution you should follow the same processes used for
IPv4 route redistribution. These include verifying device configurations and checking appropriate
filters used during redistribution, if applicable. In addition to these rules, it is also important to
remember default routing protocol operation. For example, neither RIPng nor EIGRPv6 will redistribute routes from OSPFv3 if the seed metric is not specified or if the metric is not included in
the redistribution configuration. Therefore, instead of focusing on exclusive scenarios pertaining to
IPv6 redistribution, remember the following points when both implementing and troubleshooting
IPv6 route redistribution:
420
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
•
With RIPng, if multiple instances are configured on the same router, by default, these instances will share routing information with each other (i.e., with no explicit redistribution
configuration) if they use the same default Multicast group and UDP port. This default behavior can be changed by specifying different UDP port numbers for RIPng instances using
the port [number] multicast-group [address] router configuration command when
configuring the RIPng instances.
•
By default, when redistributing between dynamic routing protocols, connected routes are
not automatically included in redistribution for IPv6. Instead, the include-connected router configuration command must be appended to the end of the redistribute [protocol]
router configuration command.
•
When redistributing between IPv6 routing protocols, any route filtering implemented during
redistribution (i.e., distribute lists and route maps) must reference either IPv6 ACLs or IPv6
prefix lists.
•
When redistributing IPv6 prefixes into OSPFv3, the subnets keyword is not required because IPv6 does not use Classful networks as IPv4 does.
•
By default, all local routes (i.e., routes marked with an ‘L’ in the RIB) are not included in the
route redistribution for any IPv6 routing protocol.
•
Only IPv6 global addresses are redistributed. Link-Local addresses, which are used as the
next-hop address for IPv6 IGPs, cannot be redistributed.
IPV4 AND IPV6 INTEROPERABILITY
Dual-stack implementations are those where internetwork devices and hosts use both protocol
stacks (i.e., IPv4 and IPv6) at the same time. Dual-stack implementation allows the hosts to use
either IPv4 or IPv6 to establish end-to-end IP sessions with other hosts.
Dual-stack implementation does not automatically mean that the IPv4-only and IPv6-only hosts
have the ability to communicate with each other. To do so, additional protocols and mechanisms
are needed. Dual-stack simply means that the hosts (and infrastructure) are able to support both
the IPv4 protocol stack and the IPv6 protocol stack.
In situations where dual-stack implementations cannot be used, it is possible to tunnel the IPv6
packets over IPv4 networks. In these implementations, tunnels are used to encapsulate IPv6 pack-
421
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
ets in IPv4 packets, allowing them to be sent across portions of the network that do not, or do not
yet natively, support IPv6. This allows the IPv6 ‘islands’ to communicate over the underlying IPv4
infrastructure.
IPv4 and IPv6 integration and co-existence strategies are divided into three broad classes as follows:
1. Dual-Stack implementations
2. Tunneling
3. Protocol translation
Cisco IOS software provides dual-stack support for several applications, tools, and protocols, which
include, but are not limited to, the following:
•
Telnet
•
SSH
•
TFTP
•
Traceroute utility
•
HTTP
•
Frame Relay
•
FHRP
•
DNS
•
Ping utility
Cisco IOS software also supports the following tunneling mechanisms, which can be used to transport IPv6 packets over native IPv4 networks:
•
Static (manually configured) IPv6 tunneling
•
6to4 tunneling
•
Automatic IPv4-compatible tunneling
•
ISATAP tunneling
•
Generic Routing Encapsulation tunneling
Static IPv6-in-IPv4 tunneling requires the static configuration of tunnels on dual-stack devices in
order to allow IPv6 packets to be tunneled across the IPv4 network. When implementing static or
manual tunnels, while the Tunnel interface itself is assigned an IPv6 address, both the tunnel source
and destination addresses must be IPv4 addresses. Finally, when implementing manual tunnels,
IPv6 packets are encapsulated in IPv4 packets by specifying a tunnel mode of ipv6ip under the
Tunnel interface.
422
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
6to4 tunnels are defined in RFC 3056 and are designed to allow IPv6 end sites to access the IPv6
backbone, commonly referred to as the 6bone, by tunneling across the IPv4 Internet. 6to4 automatic tunneling provides a dynamic method to deploy tunnels between IPv6 sites over IPv4 networks.
Unlike with manually configured tunnels, there is no need to configure tunnel source and destination addresses manually to establish the tunnels. Instead, the tunneling of IPv6 packets between
6to4 sites is performed dynamically based on the destination IPv6 address of the packets originated
by IPv6 hosts.
Automatic prefix assignment provides a global aggregatable IPv6 prefix to each 6to4 site. This prefix
is based on the 2002::/16 (0x2002) prefix assigned by IANA for 6to4 sites. The tunnel endpoint or
destination is determined by the globally unique IPv4 address embedded in a 6to4 address. This address must be an address that is globally routable. In other words, RFC 1918 cannot be used for 6to4
tunnels because they are not unique. This 32-bit IPv4 address is converted to Hexadecimal notation
and the final representation is a 48-bit prefix. For example, if the IP address 1.1.1.1 was embedded
into the IPv6 6to4 prefix, the final representation would be the 2002:0101:0101::/48 6to4 address.
Automatic IPv4-compatible tunnels enable IPv6 hosts to enable tunnels automatically to other IPv6
hosts across an IPv4 network infrastructure. Unlike 6to4 tunneling, automatic IPv4-compatible
tunneling does use the IPv4-compatible IPv6 addresses. Automatic IPv4-compatible tunnels use
the IPv6 prefix ::/96. To complete the 128-bit IPv6 address, the low-order 32 bits are derived from
the IPv4 address. These low-order 32 bits of the source and destination IPv6 addresses represent
the source and destination IPv4 addresses of the tunnel endpoints in the same manner as the 6to4
tunnels, which were described in the previous section. Therefore, with automatic IPv4-compatible
tunneling, the host or router at each end of an IPv4-compatible tunnel must support both the IPv4
and IPv6 protocol stacks.
Automatic IPv4-compatible tunneling requires the use of either MP-BGP or static routes, with the
former method being the more preferred method for scalability reasons. This tunneling mechanism requires the use of IPv4-compatible IPv6 addresses. As was stated earlier in this chapter,
these addresses have the first 96 bits set to 0 and are then followed by the 32-bit IPv4 address.
An example of an IPv4-compatible IPv6 address would be 0000:0000:0000:0000:0000:0000:172.1
6.255.1. This same address can then be compressed as 0:0:0:0:0:0:172.16.255.1/128 or simply as
::172.16.255.1/128. Additionally, it is also important to remember that the decimal IPv4 address
could be converted to Hexadecimal notation and used to create the IPv4-compatible IPv6 address
0:0:0:0:0:0: AC10:FF01/128 or simply ::AC10:FF01/128.
Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) is an automatic overlay tunneling
mechanism that uses the underlying IPv4 network as an NBMA Link Layer for IPv6. As the name
423
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
suggests, ISATAP is designed for transporting IPv6 packets within a site where a native IPv6 infrastructure is not yet available. ISATAP tunnels allow individual IPv4 or IPv6 dual-stack hosts
within a site to communicate with other such hosts on the same virtual link, creating a virtual IPv6
network using the IPv4 infrastructure. The main functionalities and components of ISATAP are
automatic tunneling, the ISATAP address format, prefixes, the interface ID, and ISATAP prefix
advertisement.
ISATAP addresses assigned to ISATAP routers and hosts are created using the concatenation of an
IPv6 global Unicast address dedicated to the ISATAP operation and the special format of the interface ID. The ISATAP prefix represents the high-order 64 bits of the IPv6 address. A single ISATAP
address is enabled on the ISATAP host using the Link-Local prefix FE80::/10 and another global or
Site-Local 64-bit prefix is assigned for ISATAP operation within the site. This prefix is then received
by ISATAP hosts from router advertisement messages sent by ISATAP routers through the ISATAP
tunnels established over the IPv4 infrastructure. The interface ID used in ISATAP represents the
low-order 64 bits of the IPv6 address assigned to the ISATAP host. ISATAP embeds IPv4 addresses
in IPv6 addresses, in the same manner used in 6to4 tunneling. This interface ID is created by appending the 32-bit IPv4 address to the high-order 32-bit value 0000:5EFE. This value has been
exclusively reserved by IANA only for ISATAP use.
Finally, Generic Routing Encapsulation (GRE) is a tunnel encapsulation protocol that is used to tunnel protocols over an internetwork. GRE is the default encapsulation protocol used on Tunnel interfaces in Cisco IOS software, if one is not explicitly configured. GRE supports multiple protocols
and can be used to encapsulate and transport protocols such as IPX, AppleTalk, and IPv6-in-IPv4
packets. This capability allows GRE to provide greater flexibility than other tunneling mechanisms.
In a manner similar to manually configured IPv6-in-IPv4 tunnels, GRE tunnels are configured statically between two routers to allow for the transport of IPv6 packets over an IPv4 infrastructure.
The only notable difference is that while IPv6-in-IPv4 tunnels use a tunnel mode of ipv6ip, GRE
tunnels use a tunnel mode of gre ip to tunnel IPv6 packets over the IPv4 infrastructure using GRE
encapsulation.
NOTE: Additional detailed information and configuration examples of the tunneling methods
described in the previous section can be found in the current ROUTE study guide available
online.
424
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
TROUBLESHOOTING IPV4 AND IPV6 INTEROPERABILITY
For the most part, all of the IPv4 and IPv6 interoperability issues are due to misconfigurations; however, in some rare cases, software and hardware defects or bugs may cause problems with some of
the mechanisms described in the previous section. Prior to jumping to such conclusions, however,
first check and then double-check your configurations. After checking and double-checking your
configurations, if you are still convinced that the desired mechanism has been configured correctly,
contact the TAC for further assistance in your troubleshooting effort. Some common problems encountered with the tunneling mechanisms described in the previous section include the following:
•
The configured Tunnel interface will not come up
•
The tunnel is up but you cannot ping across it
•
Routing adjacencies won’t establish across the tunnel
There are several reasons why a Tunnel interface may not come up, which include the following:
•
Layer 1/Layer 2 issues
•
Layer 3 issues
•
Misconfigurations
Layer 1/Layer 2 issues will prevent the Tunnel interface from transitioning to the up/up state. Following your tunnel configuration, if you notice that the line protocol reflects an up/down state,
check the status of the interface specified as the tunnel source using the show interfaces command. In most cases, this state is reflective of this source interface being shutdown (administratively disabled) or down itself, meaning that there are Layer 1 or Layer 2 issues preventing it from
coming up. Troubleshoot the issues using the relevant commands.
In order for the Tunnel interface to come up, the specified tunnel destination must be present in the
RIB of the local router. Consider the following output for example:
R2#show interfaces Tunnel0
Tunnel0 is up, line protocol is down
Hardware is Tunnel
MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation TUNNEL, Loopback not set
Keepalive not set
Tunnel source 150.1.1.2, destination 160.1.1.1
Tunnel protocol/transport IPv6/IP
Key disabled, sequencing disabled
Checksumming of packets disabled
...
[Truncated Output]
425
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
After having verified that the tunnel source interface is up on the local router, next verify that the
tunnel destination is known to the local router using the show ip route command as follows:
R2#show ip route 160.1.1.1
% Network not in table
In the output above, the specified tunnel destination address is not known to the local router, hence,
the reason for the tunnel being in an up/down state. Depending on your routing configuration, add
a static route to the tunnel destination or verify dynamic routing protocol configuration to ensure
that this address is not being incorrectly filtered, for example. In our example, we will simply assume that the tunnel destination will be reachable via a static route pointing out to the Serial0/0
interface of the local router and add the following configuration:
R2(config)#ip route 160.1.1.1 255.255.255.255 Serial0/0
Following this configuration, and having verified connectivity to the tunnel destination address, the
Tunnel interface transitions to the up/up state as follows:
R2#show interfaces Tunnel0
Tunnel0 is up, line protocol is up
Hardware is Tunnel
MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec,
reliability 255/255, txload 28/255, rxload 1/255
Encapsulation TUNNEL, Loopback not set
Keepalive not set
Tunnel source 150.1.1.2, destination 160.1.1.1
Tunnel protocol/transport IPv6/IP
Key disabled, sequencing disabled
Checksumming of packets disabled
Tunnel TTL 255
...
[Truncated Output]
Common configuration mistakes when implementing tunneling range from using or specifying the
incorrect tunnel source and/or destination addresses to specifying the incorrect encapsulation type
for the tunnel. For example, when configuring a manual tunnel, if you specify the tunnel mode as
IPv6, meaning that the tunnel will encapsulate using IPv6 packets, the tunnel will not come up if
the tunnel source and tunnel destination addresses are IPv4 addresses as illustrated in the following
output:
R1#show interfaces Tunnel0
Tunnel0 is up, line protocol is down
Hardware is Tunnel
426
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation TUNNEL, Loopback not set
Keepalive not set
Tunnel source 160.1.1.2, destination 150.1.1.2
Tunnel protocol/transport IPv6
Tunnel TTL 255
...
[Truncated Output]
When specifying IPv4 addresses as the tunnel source and tunnel destination addresses, you are
configuring an IPv4 tunnel. You must therefore specify that the IPv6 packets are encapsulated using IPv4 packets by issuing the tunnel mode ipv6ip interface configuration command. The tunnel
mode ipv6 command would be used to tunnel IPv4 packets in IPv6 packets, which you would do if
you were configuring an IPv6 tunnel instead (i.e., the specified tunnel source and tunnel destination
addresses were IPv6 address). The same applies to the other tunneling mechanisms that are also
described in this guide.
The most common reasons for a Tunnel interface being up but you are unable to ping across it
are due to basic misconfigurations, such as incorrectly typing in the tunnel address, and filtering.
When routers are connected to public networks, such as the Internet, it is common practice to
implement ACL filtering to protect both the device and network from unauthorized access. When
implementing tunnels between two such routers, it is important to ensure that the specified encapsulation protocol is permitted between the two host addresses (i.e., between the tunnel source and
tunnel destination addresses).
Another common reason for being unable to ping across Tunnel interfaces is due to mismatched
encapsulations on the Tunnel interfaces. By default, Tunnel interfaces default to GRE encapsulation in Cisco IOS software. If you specify a non-default encapsulation type, this must be the same
on both endpoints. For example, assume that a simple tunnel is configured between two routers
named R1 and R2. The tunnel configuration on R1 is as follows:
R1#show running-config interface Tunnel0
Building configuration...
Current configuration : 145 bytes
!
interface Tunnel0
no ip address
ipv6 address 3FF3:ABCD::1/64
tunnel source 160.1.1.2
tunnel destination 150.1.1.2
tunnel mode ipv6ip
end
427
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
On the remote endpoint, R2, the tunnel configuration has been implemented as follows:
R2#show running-config interface Tunnel0
Building configuration...
Current configuration : 125 bytes
!
interface Tunnel0
no ip address
ipv6 address 3FF3:ABCD::2/64
tunnel source 150.1.1.2
tunnel destination 160.1.1.2
end
For verification purposes, a simple ping between the tunnel source and destination addresses will
be used to validate end-to-end connectivity between the two as follows:
R2#ping 160.1.1.2 source 150.1.1.2 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 160.1.1.2, timeout is 2 seconds:
Packet sent with a source address of 150.1.1.2
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 1/3/4 ms
Because of the reachability between the tunnel endpoints, the Tunnel interfaces are in an up/up
state as illustrated in the output of the show interfaces command on R2 as follows:
R2#show interfaces Tunnel0
Tunnel0 is up, line protocol is up
Hardware is Tunnel
MTU 1514 bytes, BW 9 Kbit/sec, DLY 500000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation TUNNEL, Loopback not set
Keepalive not set
Tunnel source 150.1.1.2, destination 160.1.1.2
Tunnel protocol/transport GRE/IP
Key disabled, sequencing disabled
Checksumming of packets disabled
Tunnel TTL 255
However, despite the Tunnel interface state, you notice that the routers cannot ping one another as
illustrated in the following output:
R2#ping 3FF3:ABCD::1 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 3FF3:ABCD::1, timeout is 2 seconds:
..........
Success rate is 0 percent (0/10)
428
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
Parsing through the configurations again, you notice that one endpoint (R1) is using a tunnel mode
of ipv6ip, while the other endpoint (R2) is using the default, which is GRE. This encapsulation mismatch is the root cause of the problem. The recommended solution therefore would be to correct
the configuration on R2 as follows:
R2(config)#interface Tunnel0
R2(config-if)#tunnel mode ipv6ip
R2(config-if)#exit
Following this reconfiguration, you will be able to ping between the routers successfully as illustrated in the following output:
R2#ping 3FF3:ABCD::1 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 3FF3:ABCD::1, timeout is 2 seconds:
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 4/4/8 ms
When you are troubleshooting routing protocol adjacency problems across Tunnel interfaces, you
should first perform the Layer 1, 2, and 3 checks that are described in this section. Following this,
validate routing protocol configuration on both endpoints, diligently checking for things such as
incorrectly specified passive interfaces and mismatched parameters, for example. Given that these
steps are described in this and previous chapters, the same will not be repeated again in this section
to avoid being unnecessarily repetitive and redundant.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
IP version 6 Protocol Overview and Fundamentals
•
Version 6 of the Internet Protocol provides additional capabilities over the current version 4
•
The additional capabilities included in IPv6 include the following:
1. The Simplified IPv6 Packet Header
2. Larger Address Space
3. IPv6 Addressing Hierarchy
4. IPv6 Extendibility
5. IPv6 Broadcast Elimination
6. Stateless Autoconfiguration
7. Integrated Mobility
8. Integrated Enhanced Security
429
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
The three ways in which IPv6 addresses can be represented are as follows:
1. The Preferred or Complete Address Representation or Form
2. The Compressed Representation
3. The IPv6 Addresses with an Embedded IPv4 Address
•
The two kinds of IPv6 addresses that contain an embedded IPv4 address are as follows:
1. IPv4-compatible IPv6 addresses
2. IPv4-mapped IPv6 addresses
•
IPv4-compatible IPv6 addresses have the first 96 bits of the address set to a value of 0
•
IPv4-compatible IPv6 addresses complete the address using the 32-bit IPv4 address
•
IPv4-mapped IPv6 addresses have the first 80 bits set to 0 and the next 16 set to all 1s
•
IPv4-mapped IPv6 addresses complete the address using the 32-bit IPv4 address
•
IPv6 addresses can be classified as any one of the following:
1. Link-Local Addresses
2. Site-Local Addresses
3. Aggregate Global Unicast Addresses
4. Multicast Addresses
5. Anycast Addresses
6. Loopback Addresses
7. Unspecified Addresses
Understanding and Troubleshoo ng EIGRPv6
•
EIGRP for IPv6, also called EIGRPv6, is very similar to EIGRP for IPv4 or EIGRPv4
•
The same core routing protocol operation is applicable to both versions, e.g. DUAL
•
The differences between EIGRPv4 and EIGRPv6 are listed in the following table:
Protocol Characteristic
EIGRP for IPv4
EIGRP for IPv6
Automatic Summarization
Authentication or Security
Common Subnet for Peers
Advertisement Contents
Packet Encapsulation
Yes
MD5
Yes
Subnet/Mask
IPv4
Not Applicable
Built into IPv6
No
Prefix/Length
IPv6
•
By default, when EIGRPv6 is implemented on a device, the protocol is in a shutdown state
•
EIGRPv6 requires an IPv4 address as the router ID
•
EIGRPv6 uses the link local address of the neighbor router(s) as the next-hop address
430
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
Understanding and Troubleshoo ng RIPng
•
RIP next generation (RIPng) is the successor of RIPv2 but is exclusively for the IPv6 protocol
•
The similarities and differences between RIPv2 and RIPng are listed in the following table:
Protocol Characteristic
RIPv2
RIPng
Protocol Classification
Hop Limitation
Split Horizon
Poison Reverse
Transport Layer Protocol
Multicast Updates
Administrative Distance
Hold-down Timers
Destination Prefix Length
Next Hop Length
Next Hop Address
Transport
UDP Port Number
Authentication
Automatic Summarization
Can Broadcast Updates
Distance Vector
15
Yes
Yes
UDP
Yes (224.0.0.9)
120
Yes
32-bit
32-bit
Primary Interface Address
IPv4
520
Text and MD5
Yes (enabled by default)
Yes
Distance Vector
15
Yes
Yes
UDP
Yes (FF02::9)
120
Yes
128-bit
128-bit
Link-Local Address
IPv6
521
Inbuilt into IPv6 (IPsec)
Not Applicable
Not Applicable
Understanding and Troubleshoo ng OSPFv3
•
While similar in many ways, OSPFv2 and OSPFv3 have many differences which include the
following:
1. Unlike OSPFv2, OSPFv3 runs over a link, negating the need to use network commands
2. OSPFv3 uses Link-Local addresses to identify the OSPFv3 adjacencies
3. OSPFv3 introduces two new OSPF LSA types, which are the Type 8 and Type 9 LSAs
4. OSPFv3 encapsulates messages (transport) using IPv6 datagrams
5. OSPFv3 uses IPv6 Multicast groups FF02::5 and FF02::6 and not IPv4 Multicast groups
6. OSPFv3 leverages the inbuilt capabilities of IPSec for security
7. The Options field in Hello Packets and DBD packets has been expanded to 24-bits
8. The OSPFv3 Hello packet contains no address information, but includes an Interface ID
Troubleshoo ng IPv6 Route Redistribu on
•
For the most part, IPv6 route redistribution follows the same logic as IPv4 redistribution
•
However, there are some notable differences between the two, which include the following:
1. With RIPng, instances using the same port and Multicast group will share information
431
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
2. By default, connected routes are not automatically included in redistribution for IPv6
3. Route filtering must reference either IPv6 ACLs or IPv6 prefix lists
4. When redistributing IPv6 prefixes into OSPFv3, the subnets keyword is not required
5. By default, all local routes, are not included in the route redistribution
6. Only global IPv6 addresses are redistributed; Link-Local prefixes are not redistributed
IPv4 and IPv6 Interoperability
•
With dual-stack implementations, hosts and network devices run both IPv6 and IPv4
•
With tunneling mechanisms, IPv6 packets are tunneled in IPv4 packets
•
With protocol translation, IPv6-to-IPv4, and vice-versa, translation is implemented
•
Tunneling allows IPv6 packets to be encapsulated and sent over native IPv4 internetworks
•
The following tunneling mechanisms are supported in Cisco IOS software:
1. Static (Manually Configured) IPv6 Tunneling
2. 6to4 Tunneling
3. Automatic IPv4-compatible Tunneling
4. ISATAP Tunneling
5. Generic Routing Encapsulation Tunneling
•
Static tunneling requires the static configuration of tunnels on dual-stack devices
•
Static tunneling requires tunnel source and destination addresses to be specified
•
Static tunneling is enabled using the tunnel mode ipv6ip interface configuration command
•
6to4 tunneling allows tunnels to be dynamically created
•
6to4 tunneling requires no explicit tunnel destination to be configured
•
6to4 tunneling has the following characteristics:
1. Automatic or Dynamic Tunneling
2. Automatic Prefix Assignment
3. There is no IPv6 Route Propagation
•
6to4 tunneling uses the IPv6 2002::/16 prefix which was assigned by IANA for 6to4 sites
•
The tunnel mode ipv6ip 6to4 command is used to enable 6to4 tunneling
•
Automatic IPv4-compatible tunneling is also a dynamic tunneling mechanism
•
Automatic IPv4-compatible tunnels use the IPv6 prefix ::/96
•
The tunnel mode ipv6ip auto-tunnel enables automatic IPv4-compatible tunneling
•
ISATAP is an automatic overlay tunneling mechanism
•
ISATAP uses the underlying IPv4 network as an NBMA Link Layer for IPv6
•
ISATAP addressing includes the 0000:5EFE high-order 32-bit value
•
The tunnel mode ipv6ip isatap command is used to enable ISATAP tunneling
•
GRE is a tunnel encapsulation protocol that is used to tunnel protocols over an internetwork
432
C H A P T E R 10: T RO U B L ES H O OT I N G I P V6 RO U T I N G & I N T E RO P E R A B I L I T Y
•
GRE is the default encapsulation protocol used on tunnel interfaces
•
GRE provides much greater flexibility than the other different tunneling mechanisms
•
GRE tunnels use a tunnel mode of gre ip to tunnel IPv6 packets over IPv4
•
While GRE and manual tunnels have similar configurations, there are some differences as follows:
1. Generic Routing Encapsulation tunnels have a default MTU value of 1476 bytes
2. Static IPv6-in-IPv4 tunnels have a default MTU value of 1480 bytes
3. For GRE tunnels, the Link-Local address is derived via the EIU-64 method
4. For static IPv6-in-IPv4 tunnels, the Link-Local address is derived from the FE80::/96 prefix
5. GRE supports multiple protocols
6. Static IPv6-in-IPv4 tunnels only support the encapsulation of IPv6 in IPv4 packets
•
Before implementing tunneling, the following factors should be taken into consideration:
1. Maximum Transmission Unit issues
2. ICMPv4 Error Messages
3. Protocol Filtering
4. Network Address Translation
•
The most common reasons for tunneling issues are device misconfigurations
•
Additional reasons include Layer 1, 2, and 3 issues
•
In rare cases, hardware and software faults or bugs can cause tunneling issues
433
CHAPTER 11
Troubleshoo ng Cisco
Wireless LAN Solu ons
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
W
ireless LANs (WLANs) provide network connectivity almost anywhere and at much less
cost than traditional wired LANs. For this reason, WLAN solutions are commonplace in
most business environments. In addition to implementing and troubleshooting wired LAN solutions, you are also expected to understand how to implement and troubleshoot Cisco Unified
WLAN solutions. While WLANs are described in detail in the SWITCH guide, this chapter will
also describe some WLAN concepts with which you should be familiar. However, primary emphasis will be placed on WLAN troubleshooting and problem resolution. The TSHOOT certification
exam objective that is covered in this chapter is as follows:
•
Troubleshoot switch support of advanced services (i.e., Wireless, VOIP, and Video)
This chapter will be divided into the following sections:
•
Wireless Local Area Network Overview
•
The Cisco WLAN Solution
•
Troubleshooting Cisco WLAN Solutions
WIRELESS LOCAL AREA NETWORK OVERVIEW
Wireless networks use radio waves to transmit data and connect devices to the Internet, as well as
to other networks and applications, which minimizes the need for wired connections. Although a
wireless network allows users to access network resources ‘over-the-air,’ it is important to keep in
mind that wireless traffic also traverses the physical wired infrastructure. Therefore, it is imperative
to remember that Wireless Local Area Networks are meant to augment, rather than replace, the
wired LAN campus infrastructure. This augmentation allows for a flexible data communication
system within the enterprise network.
IEEE 802.11 Components
The 802.11 architecture is comprised of several logical and physical components. The 802.11 components described in this section are as follows:
•
Client or Station (STA)
•
Access Point (AP)
•
Independent Basic Service Set (IBSS)
•
Basic Service Set (BSS)
•
Extended Service Set (ESS)
•
Distribution System (DS)
436
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
The client or station (STA) refers to any appliance that interfaces with the wireless medium and
operates as an end-user device. The STA contains an adapter card, a PC card, or an embedded device to provide wireless connectivity. Some common examples of STAs include laptop computers,
desktop computers, and PDAs with wireless network interface cards.
The wireless Access Point (AP) functions as a bridge between the wireless STAs and the existing
network backbone for network access. APs serve as the central points in an all-wireless network,
or as the connection point between wired and wireless networks. When APs are used in a wireless
network, any STA attempting to use the wireless network must first establish membership, or an
association, with the AP.
An Independent Basic Service Set (IBSS) is a wireless network, consisting of at least two STAs,
used where no access to a Distribution System is available. The Distribution System (DS) will be
described later in this section. An IBSS is sometimes referred to as an independent configuration or
as an ad hoc wireless network. From a logical perspective, an IBSS is very similar to a peer-to-peer
network in which no one node performs any server functions.
The 802.11 WLAN infrastructure architecture is based on a cellular architecture that divides the
system into cells, referred to as a Basic Service Set (BSS). The BSS is controlled by a Base Station, or,
more commonly, an AP. The cell is restricted to the AP’s coverage area. Clients, or stations, within
the cell can then associate themselves with the AP, allowing them to use the wireless LAN.
Access Points may be interconnected using the switched network, creating what is referred to as
an Extended Service Set (ESS). The ESS is comprised of overlapping BSS sets (cells) that are usually
connected together by a wired medium (Distribution System). In most cases, the ESS allows stations to roam. Roaming is the process of moving from one cell (BSS) to another without losing the
wireless connection.
Finally, the Distribution System (DS) allows for the interconnection of the APs of multiple cells
(BSSs). This allows for mobility because STAs can move from one BSS to another BSS. Although
the Distribution System could be any type of network, it is almost always a wired Ethernet LAN.
However, it should be noted that it is also possible for APs to be interconnected without using
wires. The three types of DSs are integrated, wired, and wireless.
IEEE 802.11 Frames
WLANs are defined in the IEEE 802.11 standards. The IEEE 802 standards define two separate layers for the Data Link layer (Layer 2) of the OSI Reference Model. These two layers are the Logical
Link Control (LLC) and Media Access Control (MAC) sublayers. The IEEE 802.11 standards cover
437
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
the operation of the MAC sublayer and the physical layer of the OSI Reference Model. The 802.11
frame consists of a 32-byte MAC header, a variable length body between 0 and 2312 bytes, and a
4-byte FCS. The 802.11 standard uses the following three types of frames:
1. Control frames
2. Management frames
3. Data frames
802.11 uses control frames to control device access to the wireless medium. These control frames
include the Ready (Request) To Send (RTS), Clear To Send (CTS), and Acknowledgement (ACK)
frames. The RTS/CTS function is optional and is employed to prevent frame collisions. After receiving a data frame, the receiving STA will utilize an error, checking processes to detect the presence of errors. The receiving STA will send an ACK frame to the sending STA if no errors are
found. Receipt of the acknowledgment tells the original sender STA of the frame that no collisions
occurred. However, if the sending STA does not receive an ACK after a period of time, it assumes a
collision occurred and will retransmit the frame.
802.11 management frames enable stations to establish and maintain communications. There are
several management frame subtypes, which include Beacon Frames (Beacons), Association Request Frames, and Authentication Response Frames, among several other frames. While not included in this chapter, additional material and detail on the other frame types may be found in the
current SWITCH study guide, which is available online at www.howtonetwork.net.
Figure 11-1 below illustrates the exchange of management frames between an Access Point (AP)
and a client (STA) using passive scanning to synchronize itself with the AP:
Beacon Frame
Authentication Request
STA
Authentication Response
AP
Association Request
Association Response
Fig. 11-1. STA Association with an AP When Using Passive Scanning
438
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
Figure 11-2 below illustrates the exchange of management frames between an Access Point (AP)
and a client (STA) using active scanning to synchronize itself with the AP:
Probe Request
Probe Response
STA
Authentication Request
Authentication Response
AP
Association Request
Association Response
Fig. 11-2. STA Association with an AP When Using Active Scanning
Finally, data frames are sent by any STA and contain higher layer protocol information or data. As
is the case with management frames, the 802.11 standard supports several data frame types. Additional information on the 802.11 standard data frames may be found in the current SWITCH study
guide, which is available online.
IEEE 802.11 Standards
At the physical (PHY) layer, IEEE 802.11 defines a series of encoding and transmission schemes for
wireless communications, the most common of which are the Frequency Hopping Spread Spectrum (FHSS), Direct Sequence Spread Spectrum (DSSS), and Orthogonal Frequency Division Multiplexing (OFDM) transmission schemes. Although Infra Red (IR) also exists at this layer, very little
development of this standard has occurred due to line-of-sight limitations. The 802.11 standards
described in this section are as follows:
•
IEEE 802.11 (original)
•
IEEE 802.11b
•
IEEE 802.11a
•
IEEE 802.11g
•
IEEE 802.11n
The original IEEE 802.11 standard defined WLANs that provided up to 2 Mbps of throughput. The
original standard specified the FHSS and DHSS transmission schemes and the S-Band Industrial,
439
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Scientific, and Medical (ISM) frequency band, which operates in the frequency range of 2.4 to
2.5 GHz. The original 802.11 standard is also sometimes referred to as IEEE 802.11 legacy and is
now considered obsolete because the throughput is too slow for most applications.
The 802.11b standard is an extension of 802.11 that operates in the same unregulated 2.4 GHz band
as the original 802.11 standard. Devices operating on 802.11b use DSSS modulation for higher
speeds. The data rates on a channel can vary according to client capabilities and conditions. However, the only possible data rates are 1, 2, 5.5, and 11 Mbps. The 5.5 Mbps and 11 Mbps are two
new speeds added to the original specification. The 2.4-GHz band consists of 14 channels, each 22
MHz wide. In North America, the Federal Communications Commission (FCC) allows channels
1 through 11. Most of Europe can use channels 1 through 13. In Japan, only channel 14 is used.
APs or clients use a spectral mask or a template to filter out a single channel based around a center
frequency.
The IEEE 802.11a standard is an extension of 802.11 that applies to wireless LANs and provides up
to 54 Mbps. This standard uses OFDM and does away with spread spectrum. As a result, it is not
compatible with 802.11b or 802.11g and therefore this standard is seldom used any more.
802.11a equipment operates at 5GHz. This higher frequency range means that 802.11 signals are
absorbed more readily by walls and other solid objects in their path due to their smaller wavelength
and, as a result, they cannot penetrate as far as those of 802.11b. The FCC has allocated 300MHz
of RF spectrum for unlicensed operation in the 5GHz block referred to as the Unlicensed National
Information Infrastructure (U-NII) band.
The 802.11g standard also works in the same 2.4GHz range as 802.11b. IEEE 802.11g operates at a
bit rate as high as 54 Mbps but uses the S-Band ISM and OFDM. However, unlike 802.11a, 802.11g
is backward compatible with 802.11b, and can operate at the 802.11b bit rates and use DSSS. Like
802.11a, 802.11g uses 54 Mbps in ideal conditions and the slower speeds of 48 Mbps, 36 Mbps, 24
Mbps, 18 Mbps, 12 Mbps, and 6 Mbps in less-than-ideal conditions.
IEEE 802.11n improves on 802.11a and 802.11g maximum data rate, with a significant increase in
the rate from 54 Mbps to approximately 600 Mbps. The 802.11n standard includes several enhancements to the previously described 802.11 standards. The enhancements include MIMO, 40-MHz
operation, frame aggregation at the MAC sublayer, and backward compatibility, which makes it
possible for multiple 802.11 a, 802.11b, 802.11g, and 802.11n devices to coexist.
440
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
THE CISCO WLAN SOLUTION
The Cisco Wireless LAN solution is designed to provide IEEE 802.11 wireless networking solutions for both enterprises and service providers. It consists of Cisco Wireless LAN Controllers
(WLCs) and their associated Lightweight Access Points (LAPs).
WLCs work in conjunction with Cisco Access Points as well as the Cisco Wireless Control System
(WCS) to support business-critical wireless applications. WLCs are responsible for system-wide
wireless LAN functions, such as the following:
•
Integrated Intrusion Prevention System (IPS)
•
Zero-touch deployment of Lightweight Access Points (LAPs)
•
Real-time Radio Frequency (RF) management
•
Wireless LAN redundancy
•
Dynamic channel assignment for each LAP
•
Dynamic client load balancing across LAPs
•
Dynamic LAP transmit power optimization
•
Wireless LAN security management
WLCs communicate with controller-based APs over any Layer 2 (Ethernet) or Layer 3 (IP) infrastructure using the Lightweight Access Point Protocol (LWAPP). LWAPP is an IETF draft protocol.
An LAP discovers a controller with the use of LWAPP discovery mechanisms. The LAP sends
an LWAPP join request to the WLC and the controller sends the LAP an LWAPP join response,
which allows the AP to join the controller.
When using LWAPP, although the LAP is under the control of the centralized WLC, the actual
processing of data and management protocols and AP capabilities is divided between the LAP and
the centralized WLC (the split-MAC architecture).
NOTE: In controller software release 5.2 or later, Cisco LAPs use the IETF standard Control
and Provisioning of Wireless Access Points (CAPWAP) protocol in order to communicate between the controller and other LAPs on the network. Controller software releases prior to 5.2
use the LWAPP for these communications.
CAPWAP, which is based on LWAPP, is a standard, interoperable protocol that enables a controller to manage a collection of wireless APs. LAPs can discover and join a CAPWAP controller. The
one exception is for Layer 2 deployments, which are not supported by CAPWAP. Additionally,
CAPWAP and LWAPP controllers may be deployed in the same network. The CAPWAP-enabled
software allows APs to join a controller that runs either CAPWAP or LWAPP.
441
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
When the LAP joins to the controller, it downloads the controller software if the revisions on the
LAP and the controller do not match. Following that, the LAP is completely under the control of
the controller and is unable to function independently of the controller.
LWAPP secures the control communication between the LAP and the controller by means of a
secure key distribution. The secure key distribution requires already provisioned X.509 digital
certificates on both the LAP and the controller. Factory-installed certificates are referenced with
the term ‘MIC,’ which is an acronym for Manufacturing Installed Certificate.
The LWAPP Discovery Process
Despite the split-MAC architecture, it is important to remember that LAPs cannot act independently of the WLC. The WLC manages the LAP configurations and firmware. The LAPs are zerotouch deployed, meaning that there is no individual configuration of LAPs required when they are
deployed into the WLAN.
In order for the WLC to manage the LAP, the LAP should discover the controller and register with
the WLC. After the LAP has registered to the WLC, LWAPP messages are exchanged and the AP
initiates a firmware download from the WLC if there is a version mismatch between the AP and the
WLC. This allows the LAP to sync with the WLC.
Following the sync, the WLC provisions the LAP with the configurations that are specific to the
WLANs so that the LAP can accept client associations. These WLAN-specific configurations include the Service Set Identifier (SSID), any additional required security parameters, and 802.11 parameters, such as the data rate, radio channels to use, and the power levels. The following sequence
of events must occur in order for an LAP to register to a WLC:
1. The LAPs issue a Dynamic Host Configuration Protocol (DHCP) Discovery Request to get
an IP address. This happens only if the LAP has not been configured with a static IP address.
2. The LAP sends LWAPP Discovery Request messages to the WLCs. If Layer 2 LWAPP mode is
supported on the LAP, the LAP broadcasts an LWAPP Discovery message in a Layer 2 LWAPP
frame. However, if the LAP or the WLC does not support Layer 2 LWAPP mode, the LAP attempts a Layer 3 LWAPP WLC discovery. The LAPs use the Layer 3 discovery algorithm only
if the Layer 2 discovery method is not supported or if the Layer 2 discovery method fails. The
LWAPP Layer 3 WLC discovery algorithm repeats until at least one WLC is found and joined.
3. Any available WLC that receives the LWAPP DHCP Discovery Request responds with an
LWAPP Discovery Response.
442
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
4. If the LAP receives more than one LWAPP Discovery Response, it selects the WLC to join,
which is typically the first WLC to respond to the LAP.
5. The LAP then sends an LWAPP Join Request to the WLC and the WLC validates the LAP
and then sends an LWAPP Join Response to the LAP.
6. The LAP validates the WLC, which then completes the Discovery and Join process. The LWAPP
Join process includes mutual authentication and encryption key derivation, which is used to
secure both the Join process and LWAPP Control messages between the LAP and the WLC.
7. The LAP registers with the WLC and can begin accepting client associations.
Wireless LAN Roaming
One of the most significant advantages of WLANs over wired LANs is roaming, or mobility. Roaming is a wireless LAN client’s ability to maintain its association seamlessly and securely from one AP
to another, with as little latency as possible.
When a wireless client associates and authenticates to an AP, the AP’s controller places an entry
for that client in its client database. This entry includes the client’s MAC and IP addresses, security
context and associations, Quality of Service (QoS) contexts, the WLAN, and the associated AP. The
controller uses this information to forward frames and manage traffic to and from the wireless client. The Cisco WLAN supports three types of roaming, which are as follows:
1. Intra-controller roaming (same subnet)
2. Inter-controller roaming (same subnet)
3. Inter-subnet (Layer 3) roaming
Intra-controller roaming occurs when a wireless client roams between APs that are joined to the
same controller. In such cases, the controller simply updates the client database with the newly associated AP. If necessary, new security context and associations are established as well.
Inter-controller roaming occurs when the client roams from an AP joined to one controller to an
AP joined to a different controller. When the client associates to an AP joined to a new controller,
the new controller exchanges mobility messages with the original controller, and the client database entry is moved to the new controller. New security context and associations are established,
if necessary, and the client database entry is updated for the new AP. This process is transparent or
invisible to the user and is facilitated by the exchange of mobility and packets between the WLCs.
These packets are exchanged through EtherIP packets (IP protocol 97).
443
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Inter-subnet roaming is somewhat similar to inter-controller roaming, with some differences. With
inter-subnet roaming, the wireless LAN interfaces of the WLCs are on different subnets. In addition, inter-subnet roaming does not move the client database entry to the new controller. Instead,
the original controller marks the client with an anchor entry in its local database, and this is copied
to the new controller client database and marked as a foreign entry. The client keeps its IP address
and the entire process is transparent.
TROUBLESHOOTING CISCO WLAN SOLUTIONS
Like wired LAN solutions, WLAN solutions are comprised of many different elements. Therefore,
in order to troubleshoot the WLAN solution, it is important to have a solid understanding of
WLAN components and how they interact with each other in the overall solution. This understanding simplifies the overall WLAN solution troubleshooting process. Generally speaking, the
majority of Wireless LAN issues fall into one of the following problem areas:
•
Wireless Station (STA) or client issues
•
WLC configuration issues
•
AP configuration issues
•
AP and WLC registration issues
•
Infrastructure issues
•
Antenna and radio frequency issues
The sections that follow describe some common problems that may occur in these areas. Additionally, suggested solutions to correct (or avoid) such issues are also described.
Wireless Sta on (STA) Issues
Troubleshooting wireless clients (STAs) is an integral component of the overall WLAN solution
troubleshooting process. This process can be used to narrow down the root of the wireless problem.
For example, if a single client is unable to connect to the WLAN but every other client is able to
connect, you can eliminate the WLAN solution devices (e.g., APs and WLCs) and troubleshoot the
client itself. However, if multiple or all clients are unable to access the WLAN, you can eliminate
the clients and focus your efforts on WLAN solution devices instead. Basic Wireless client troubleshooting should include the following tasks:
•
Checking the client wireless NIC state
•
Checking client settings
•
Checking the state of the wireless client
444
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
Checking the client wireless network connection includes tasks such as validating that the client
wireless network card is enabled and is functioning properly. Some wireless devices have a radio
button that can be toggled back and forth to enable and disable the wireless network connection. It
is not uncommon for this to be accidentally disabled, for example, such as when one is removing a
laptop out of a carrying case. In most cases, the Operating System will indicate that the radio button is disabled by displaying a warning or other error message or using some kind of visual indicator, such as a red do-not-enter symbol for example. Figure 11-3 below shows the warning message
displayed on a Windows-based machine after the wireless radio is disabled:
Fig. 11-3. Client Wireless Radio Warning Indicator and Message
In addition to verifying whether the wireless network connection is enabled, it is also important to
check whether the wireless network interface controller (NIC) has been installed correctly and is
operating as expected (i.e., there are no error or warning messages printed by the Operating System). Additionally, it is also prudent to validate that the TCP/IP stack has been installed properly,
is working, and that the client is correctly configured to receive IP addressing information dynamically using the DHCP service. Keep in mind that these checks will vary depending on platform.
After validating that the wireless NIC is functioning as it should be, the next logical step would be
to check the wireless settings. These include the SSID, security or authentication configuration (if
applicable), and station configuration. Once the AP has been discovered, the client must establish
an association. The AP may have some specific requirements that must be satisfied before allowing the STA to join the cell. For example, the AP may request a matching SSID, a supported 802.11
standard, or some form of authentication. It is important to ensure that these parameters match
on both the client and the AP; otherwise, the client will not be able to establish an association with
the AP.
Checking the state of the client involves verifying whether the client is able to detect any wireless
networks, based on the assumption that the wireless NIC is working as it should be. Additional
client checks should include checking for signal interference, and verifying whether the client is
associated. Different Operating Systems and vendors have tools that can be used to troubleshoot
the client state. Check the vendor or manufacturer documentation for additional information on
using their utilities. As an example, Figure 11-4 below shows the Dell Wireless WLAN Card Utility
that can be used to troubleshoot client state issues on a machine installed with a Dell WLAN Card:
445
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Fig. 11-4. Client Wireless State Troubleshooting Tools
NOTE: When troubleshooting client association, you should also check the station’s status on
the AP. AP troubleshooting is described later in this section.
WLC Configura on Issues
As is the case with all of the other technologies and protocols described in this guide, device misconfigurations are a common cause of problems. Some common WLC device misconfigurations
include the following:
•
Mismatched Service Set Identifiers
•
Security mismatches
•
WLANs are disabled on the WLC
•
Data rate mismatches
•
Incorrect client or station filtering
•
Unsupported features
•
IP address assignment issues
•
SSID Broadcast is disabled
446
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
NOTE: It should be noted that while the section that follows does include common WLC and
AP misconfigurations, as well as recommendations for resolving such problems, you are not
expected to perform any WLC troubleshooting in the current TSHOOT certification exam.
Emphasis should instead simply be placed on understanding the potential root of the problem.
When multiple clients are unable to connect to the WLAN, a good point to start is by checking the
configured SSID on the WLC. Just as the SSID could be misconfigured on the client, it is possible
for it also to be misconfigured on the WLC. When verifying the configured SSID, it is important to
remember that the SSID is case sensitive.
Security mismatches are another common cause of WLAN problems. These parameters must
match on the client and the WLC. If the authentication type is Static WEP, verify that the appropriate encryption key and key index on the WLC matches that of the client. Alternatively, if the
authentication type is 802.1x or WPA, ensure that the authentication type and the encryption key
size match on both the client and the WLC. In the event that these parameters are indeed mismatched, they should be corrected on both the WLC and the client.
When a WLAN is configured on the WLC, it is important to ensure that it is enabled. By default,
the status of the WLAN is not enabled on the WLC. Instead, it must be enabled manually following its configuration. If the WLAN is disabled, clients will not be able to associate. Verify the
configuration of the WLC to ensure that all configured WLANs are enabled.
The Cisco Unified WLAN solution allows administrators to specify data rates for the AP radio.
Data rates can be specified as either mandatory or supported. If a data rate is specified as mandatory, the client must support it, otherwise association will fail. If a certain data rate has been
configured as mandatory, verify that it is supported by the wireless client. To avoid situations such
as these, it is recommended that you set the lowest data rate on the WLC to mandatory and then
specify any other data rates as supported.
On the WLC, there is an option to disable the clients manually, which helps to prevent rogue clients
from trying to access the network. While such policies enhance WLAN security, misconfigurations
can result in legitimate clients being denied access to the WLAN. If your organization’s policy requires such security, check the WLC configuration to ensure that the client that cannot connect to
the WLAN is not included in the list of filtered client MACs.
Cisco WLCs include some proprietary features and functions that may not be supported by nonCisco clients. Common proprietary features that are enabled by default include radio preambles
and Management Frame Protection (MFP). The radio preamble, which is sometimes called a head-
447
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
er, is a section of data at the head of a packet, which contains information that wireless devices need
when they send and receive packets.
MFP ensures the integrity of the 802.11 management frames by allowing the AP to add a Message
Integrity Check Information Element (MIC IE) to each frame. Any attempt made by the intruders to copy, alter, or replay the frame invalidates the MIC, which causes any receiving AP, which
is configured to detect MFP frames, to report the discrepancy. Although enabled by default, these
features are not supported by all clients. When non-Cisco clients exist, it is recommended that such
features be disabled if not supported.
IP addressing issues are one of the most commonly experienced WLAN issues. The WLC can be
configured as a DHCP relay agent or as a DHCP server itself. When the WLC operates as a DHCP
relay agent, it forwards DHCP messages from clients to the specified DHCP server(s). When the
DHCPOFFER comes back to the controller, it changes the DHCP server IP address to its virtual
IP address, which is typically set to 1.1.1.1. When clients roam, the first thing they attempt to do
is contact the DHCP server to renew their IP address. By using its own virtual IP address as that
of the DHCP server, the WLC is able to intercept the client DHCPREQUEST packets. Given this
behavior, it is important to ensure that all WLCs (if more than one exists) are configured with the
same virtual IP address, as it prevents clients from beginning the entire DHCP process each time
they roam between APs because they believe that they are communicating with the same DHCP
server.
With Cisco 4400 series WLCs, by default, the Broadcast SSID parameter is disabled. This is a
problem for non-Cisco clients or other devices that perform only passive scans (i.e., those that do
not transmit probe requests) to locate an AP. In hybrid or in non-Cisco client environments, you
should enable the Broadcast SSID parameter so that any passive clients will be able to associate.
This action also allows any clients that do not have an SSID explicitly configured to associate.
AP Configura on Issues
While the Cisco Unified WLAN Solution includes both WLCs and APs, it is also possible to implement a Cisco WLAN solution with just APs running in autonomous mode. In such implementations, all APs must be configured individually, and it is important to ensure that configuration
parameters are consistent between the APs.
The most common issues with APs that are running in autonomous mode are service interruptions when clients are roaming. When implementing APs in autonomous mode it is important to
ensure that all of the APs are configured with the following parameters:
448
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
•
The same Service Set Identifier
•
The same IP subnet
•
The same Layer 2 native VLAN
When a client roams from one AP to another, the client will discard 802.11 probe responses and
beacons received from APs unless they have matching SSID and encryption settings, which results
in connectivity issues as the client moves from one AP to another. When roaming, WLAN clients
first perform a Layer 2 roam as they move from one AP to another within the same subnet. However, if the APs are in a different subnet, clients perform Layer 3 roaming, which entails the client to
acquire a new IP address, interrupting previously connected sessions. This behavior will adversely
impact wireless VoIP phones and similar services, as well as any other applications using the original IP address.
Finally, APs use the native VLAN to communicate information about clients that are roaming with
other APs. For this reason, it is important to ensure that a consistent native VLAN is used when
implementing multiple autonomous APs, as management traffic is sent and received across this
VLAN.
AP and WLC Registra on Issues
In order to troubleshoot LAP and WLC integration issues effectively, it is important to have a solid
understanding of the interaction between the LAP and the WLC, as was described earlier in this
chapter. One of the most common issues that prevent LAP registration with the WLC is forgetting to configure DHCP Option 43. DHCP Option 43 is required to provide the LAP with the IP
address(es) of the WLC in the DHCPOFFER message. If this option is not specified, the LAP is
unable to register with the WLC.
As stated earlier in this chapter, the LAP attempts to use the Layer 2 discovery method first and
then reverts to the Layer 3 discovery method if the Layer 2 method is not supported or fails.
When the LAP and the WLC reside on different subnets, the LAP uses Layer 3 discovery to locate
a WLC. In this mode, the LAP will broadcast a Layer 3 LWAPP discover message on the local subnet. If the WLC resides on a remote subnet, the DHCP relay agent is required to ensure that these
messages are relayed to the WLC. When using the Cisco IOS DHCP relay agent, it is important to
remember that LWAPP discovery messages are not forwarded by default when the ip helperaddress <address> interface configuration command is issued. Given this, it is important to en-
sure that LWAPP Broadcasts, which use UDP port 12223, are forwarded by the DHCP relay agent
by adding the ip forward-protocol udp 12223 global configuration command to the Cisco IOS
DHCP relay agent configuration file. Without this configuration command, the LAP will not be
able to communicate with the remote WLC.
449
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Infrastructure Issues
Infrastructure issues can also cause WLAN issues. As stated earlier in this chapter, the wireless
LAN relies on the wired LAN for connectivity. For this reason, it is important to ensure that switches and other intermediate devices are adequately configured to support the WLAN extension. For
example, when connecting a WLC to a switch, the port should be configured as a trunk link. Verify
the switch port configuration by checking the configuration or by using the show interfaces
<name> switchport command. Unlike WLC ports, the switch ports connected to LAPs should
not be configured as trunk links, but as access ports. These ports should then be assigned to an
active VLAN, which is typically the management VLAN. Again, verify switch configuration by
checking the configuration or by using the show interfaces <name> switchport command.
Additionally, ensure that PortFast is enabled on ports that are connected to LAPs so that these
ports transition to the Forwarding state immediately.
Infrastructure devices such as Multilayer switches can also be configured as DHCP servers or
DHCP relay agents. When the device is configured as a DHCP server, it is important to verify that
DHCP Option 43 is included in the Cisco IOS DHCP server configuration. This is implemented
using the option 43 ascii “<address>” DHCP configuration command. Likewise, if the device
is configured as a Cisco IOS DHCP relay agent, ensure that the ip forward-protocol udp 12223
global configuration command is used if the LAP and WLC reside on different IP subnets.
Like Cisco IP phones, Cisco APs can use an external power source to draw their power, or they
can draw their power from the switch to which they are connected. This power is sent within the
Ethernet cable connecting the switch and the AP using either the IEEE 802.3af-2003 standard
or the Cisco Inline Power method. Because Power over Ethernet (PoE) is increasingly used in
today’s converged networks, it is important to calculate accurately the amount of power that will
be required by the devices that will draw power from the switch. Cisco provides an online power
calculator tool (requires login) that can be used to make this determination.
If an AP is connected to a switch and is unable to draw sufficient power, perhaps because other
devices connected to the switch are consuming all available power, or because the switch itself is
unable to provide sufficient power to all devices due to incorrect power calculations by administrators, in some cases, a message similar to the following will be logged on the console:
%CDP_PD-2-POWER_LOW: All radios disabled - LOW_POWER_CLASSIC inline
This message means that the AP has detected that the switch port (PSE) is not able to provide sufficient power and therefore has transitioned to low-power mode. In low-power mode, the AP will
disable all radios, effectively meaning that no stations or clients will be able to associate with that
450
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
AP. An important fact to remember is that if the AP is connected to both an external power injector
and a PoE switch port, and if the switch is not able to provide the AP with sufficient power, it will
still log this message and disable the radios, even though the external power injector can provide
enough power. In essence, PoE information received via Cisco Discovery Protocol (CDP) takes
precedence. In such cases, the AP must be configured to ignore the CDP information and use the
external power injector.
NOTE: AP configuration is beyond the scope of the TSHOOT certification exam and will not
be described in any additional detail in this chapter or in the remainder of this guide.
On the switch side, you can use the show power suite of commands to verify available power for
the entire chassis, on a per-module or per-interface command, as shown below. As an example, the
show power inline <interface> command shows how much power is drawn by the device con-
nected to the switch port. The following example shows the power used by a connected IP phone:
Cat-6500-1#show power inline GigabitEthernet2/1
Interface Admin Oper
Power(Watts)
Device
Class
From PS
To Device
--------- ------ ---------- ---------- ---------- ------------------- ----Gi2/33
auto
on
13.5
12.0
Cisco IP Phone 7945 3
Interface
AdminPowerMax
(Watts)
---------- --------------Gi2/1
15.4
The following example shows the same output for a switch port connected to an AP. In this case,
the AP is using an external power injector and is not drawing power from the switch, implying the
AP has been appropriately configured to use the external power source:
Cat-6500-1#show power inline GigabitEthernet2/2
Interface Admin Oper
Power(Watts)
Device
Class
From PS
To Device
--------- ------ ---------- ---------- ---------- ------------------- ----Gi2/2
auto
off
0
0
Interface
AdminPowerMax
(Watts)
---------- --------------Gi2/2
15.4
451
cisco AIR-LAP1252AG n/a
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
In addition to the previously described infrastructure checks and verifications, it is important to
perform basic additional infrastructure checks, such as verifying Layers 1 and 2. When troubleshooting Layer 1 and Layer 2 issues, it is important to understand that given that wireless networks
operate in a shared media, more so than wired networks, it is not uncommon to see Cyclic Redundancy Check (CRC) or Physical Layer Control Protocol (PLCP) errors. However, having stated
that, it is important to understand that while these errors are normal, an excessive amount of these
errors may indicate wireless network issues, which include the following:
•
Packet collisions due to densely populated clients
•
Overlapping channels
•
High multipath conditions due to bounced signals
•
Other signals in the 2.4GHz band
Recall from the SWITCH guide that radio interfaces (WLANs in general) operate in half-duplex
mode because a single frequency is used to transmit and receive data. Therefore, in environments
with a dense population of clients (STAs), keep in mind that while the 802.11 standard does have
some mechanisms for avoiding collisions (i.e., CSMA/CA), it is still possible for collisions to occur
in environments with a dense client population. This can adversely impact WLAN performance
and result in intermittent connectivity issues.
In cases where you do observe a large amount of CRC errors, consider checking for possible radio
interference, antennas, and cabling, as well as the line of sight (LOS) between the transmitter and
the receiver, to ensure that the LOS is clear from possible interfering objects.
NOTE: WLAN problems caused by overlapping channels, multipath conditions, and the presence of other signals in the 2.4GHz band are described in the following sections.
In addition to checking for errors, it is also important to remember basic Ethernet fundamentals.
For example, if the WLAN is experiencing intermittent connectivity or connectivity with errors,
there may be a possibility that the cable length is greater than the recommended Ethernet segment lengths. This is applicable not only to APs but also to antenna cabling. When implementing
a WLAN solution, cable runs should be kept as short as possible to allow for optimum efficiency
and to prevent loss, which is likely if the cable runs are long. Instead of using standard cabling,
such as traditional coaxial cable, to connect antennas, for example, consider using Cisco antenna
cables instead. While they may be slightly more expensive than other standard cables, they are
recommended for optimum efficiency of the overall WLAN solution.
452
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
Finally, if the LAP and WLC are located on remote subnets, ensure that no ACLs are blocking
communication between the two devices. In addition to these checks, ensure that configurations
on the infrastructure devices are correct. For example, if using the Cisco IOS DHCP relay agent
to forward LWAPP Broadcasts to the WLC, ensure that the ip forward-protocol udp 12223
global configuration command is included in the configuration, in addition to the ip helperaddress <address> interface configuration command. Similarly, when using Cisco IOS DHCP
servers, ensure that DHCP Option 43 is also included in the configuration.
Antenna and Radio Frequency Issues
Antennas are an integral component of wireless implementations. Antennas provide the WLAN
system with three fundamental properties, which are gain, direction, and polarization. Gain is a
measure of increase in power and is used to describe the amount of increase in energy that an
antenna adds to a radio frequency (RF) signal. Direction is the shape of the transmission pattern.
Polarization is the physical orientation of the element on the antenna that actually emits the RF
energy. Understanding basic antenna functionality and operation is fundamental to understanding
the overall wireless solution. In addition, a solid understanding of these principles is necessary for
supporting and troubleshooting wireless problems.
Cisco wireless equipment supports different styles of antenna. Each of these types has different
coverage capabilities. The supported antenna types include omnidirectional and directional antenna types. Omnidirectional antennas are designed to provide a 360-degree radiation pattern and
are commonly used when coverage in all directions from the antenna is required. Omnidirectional
antenna operation is illustrated in Figure 11-5 below. The coverage provided by this type of antenna is shown in gray (U.S. English) or grey (UK English):
Antenna
Fig. 11-5. Omnidirectional Antenna Coverage
453
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Directional antennas come in different styles and shapes. Directional antenna types include yagi
antennas, patch antennas, and parabolic dishes. Yagi antennas are simply antenna types that radiate only in a specific direction. Figure 11-6 below illustrates basic yagi antenna operation. The
coverage is shown in gray (U.S. English) or grey (UK English):
Antenna
Fig. 11-6. Directional Yagi Antenna Coverage
Directional patch antennas are simply a type of flat antenna. Like directional yagi antennas, they
provide coverage in a specific direction. Figure 11-7 below shows a mounted patch antenna:
Fig. 11-7. Mounted Directional Patch Antenna
Finally, parabolic antennas are simply antennas that look like satellite dishes, with which we all
should be familiar. These antennas are also commonly referred to simply as a dish antenna. Parabolic antennas have a very narrow RF energy path. These antennas are typically used only in outdoor wireless implementations.
454
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
Having discussed antenna basics, the remainder of this section will describe some of the more common sources of problems, which include, but are not limited to, the following:
•
Radio power optimization
•
Radio interference
•
Electromagnetic Interference
•
AP channel interference
•
Multipath
•
Antenna power issues
In some instances, when the AP and the clients that are associated with it are within close proximity, the clients may be disconnected from the AP. Although rare, this can result in poor WLAN performance and intermittent connectivity for clients associated with the AP(s). The recommended
solution for such issues is keeping clients away from the AP. This can be performed by installing the
AP in locations that, while still accessible to clients, are not within close proximity to the clients. For
example, instead of placing an AP on a conference room table, the AP should be placed or mounted
on a wall or ceiling in the conference room.
In some cases, it may not be possible to prevent clients from being within such close proximity to
the AP. For example, it may not be possible to mount the AP in a factory. If the AP and clients will
be in such close proximity, you can reduce the power of the AP to prevent clients that are too close
to the AP from being disconnected.
Radio interference is a common cause of WLAN problems due to the shared media. Proactively,
such issues can be avoided by performing site surveys prior to implementing the WLAN solution.
Radio interference issues are a common phenomenon because a license is not required to operate
radio equipment in the 2.4 GHz band, which is the same band Cisco Aironet WLAN equipment
operates. It is therefore possible for other devices, such as microwave ovens or wireless phones, to
be using the same band, resulting in interference. You can use a spectrum analyzer to determine
the presence of any other activity on your frequency. In the event that there is too much interference, consider changing frequencies, if possible.
Electromagnetic Interference (EMI), also referred to as Radio Frequency Interference (RFI), is a
disturbance that affects an electrical circuit due to either electromagnetic induction or electromagnetic radiation emitted from an external source. While EMI does not necessarily affect signal
transmissions, per se, it can affect the components of the transmitter, resulting in poor WLAN
performance and intermittent connectivity issues, for example. To avoid the potential problems
caused by EMI, you should ensure that APs are placed away from any potential EMI sources, such
455
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
as fluorescent lights and high-voltage power lines, for example. In some environments (e.g., factories), if you cannot completely remove the AP from sources of EMI, such as power lines, you could
alternatively supply conditioned power to the WLAN equipment in order to lessen the effects of
EMI generated on those power circuits. However, the recommended solution would still be to isolate the equipment from such sources.
Channel interference, which is a direct result of a poor implementation, is also a common cause
of WLAN issues. As was stated earlier in this chapter, the 2.4-GHz band consists of 14 channels,
each 22 MHz wide. In North America, the FCC allows channels 1 through 11. Most of Europe can
use channels 1 through 13. In Japan, only channel 14 is used. Taking this into consideration, when
installing APs, you should ensure that adjacent APs use non-overlapping channels.
Within the 2.4-GHz range, there are three channels that do not overlap. These channels are 1, 6,
and 11. Therefore, use these channels alternately when deploying APs in an ESS. Figure 11-8 below
illustrates a recommended AP deployment using these non-overlapping channels:
Distribution System
CISCO AIRONET 350 SERIES
WIRELESS ACCESS POINT
CISCO AIRONET 350 SERIES
WIRELESS ACCESS POINT
CISCO AIRONET 350 SERIES
WIRELESS ACCESS POINT
AP #1
AP #2
AP #3
Channel 1
Channel 6
Channel 11
Fig. 11-8. Implementing APs Using Non-Overlapping Channels
Referencing Figure 11-8, three APs deployed within close proximity or overlapping coverage areas
are configured to use overlapping channels to avoid RFI issues, which may lead to connectivity issues and poor throughput. If an additional AP is added, say AP #4, then the AP can be configured
to use channel 1. If yet another AP is added, say AP #5, this AP would be configured to use channel
6, and so forth.
Multipath is a common cause of WLAN problems due to the nature of the medium used. This
situation occurs when RF signals take different paths from a source to a destination. When Radio
Frequency signals are transmitted, they become wider as they are transmitted further. This increase
in width increases the likelihood of the RF signals running into objects that reflect, refract, diffract,
or interfere with the signal, such as furniture, walls, or coated glass.
456
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
When the RF signal is reflected off an object, this causes multiple, duplicate wavefronts to be created and propagated, resulting in multiple wavefronts being received by the receiver. The WLAN
multipath concept is illustrated in Figure 11-9 below:
Ceiling
AP
STA
Reflector
Object
Fig. 11-9. Understanding WLAN Multipath
Referencing Figure 11-9, the AP transmits a signal. As the signal travels further, it widens. Part
of the signal goes straight to the destination, while other parts bounce off an obstruction, such as
the ceiling or any other reflector object, such as a steel cabinet, and then go on to the destination.
As a result of this obstruction or interference, the obstructed signals will encounter some delay
and travel a longer path to reach the same destination. This results in the client receiving multiple
wavefronts.
When these different waveforms combine, they cause a distortion of the waveform, resulting in
poor signal quality, even though the actual signal strength itself may be strong. This delay causes
the information symbols represented in 802.11 signals to overlap, which then affects the decoding
capability of the client (receiver) and results in poor performance and connectivity issues. The recommended solution is to implement diversity.
Diversity is the use of two antennas for each radio. Not only does this increase the probability of
receiving better signals from either one of the antennas, it also allows the radio to compensate for
errors due to RFI and provides relief to a wireless network in a multipath scenario. With diversity,
457
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
only a single antenna (the best antenna for transmitting to the receiver) is used. The antennas are
not used at the same time to avoid introducing multipath issues themselves. By default, Cisco APs
default to antenna diversity (i.e., to using dual antennas). Although not recommended, this default
behavior can be modified because only a single antenna is required to provide radio operations.
NOTE: An alternative to implementing the dual antennas is to implement the 802.11a standard, which provides higher data rates than DSSS and minimizes the effects of multipath propagation on signal quality and throughput. However, keep in mind that 802.11a is not compatible with the other more commonly used 802.11b and 802.11g standards, and typically costs
more, from a monetary standpoint, to implement.
The final issue discussed in this section pertains to antenna power issues. Antenna gain ratings are
measured in decibels (dB), which is a ratio between two values. An antenna rating is typically to the
gain of an isotropic (dBi) or dipole (dBd) antenna. The isotropic antenna is a theoretical antenna
that transmits equal power density in all directions. These antennas are used only as theoretical
(mathematical) references and do not exist in the real world; however, because the U.S. FCC uses
dBi for its calculations, this same standard is also used by most wireless equipment manufactures
and vendors, such as Cisco, for example.
Dipole antennas are more like real-world antennas. While some antennas are rated in dBd, most
of the ratings use dBi because all FCC calculations are based on the dBi measurement. Antenna
power is a major factor that should be taken into consideration when designing the WLAN, as incorrect calculations may result in poor WLAN performance, resulting in issues such as intermittent connectivity or even in an outright or complete loss of connectivity in some areas.
When you are determining which antenna to use or when you are troubleshooting potential RF
power issues, keep in mind that as the gain of an antenna increases so does the signal strength and
directivity; however, this comes with a tradeoff in that the antenna’s coverage area is diminished.
Directivity measures the power density an antenna radiates in the direction of its strongest emission. To clarify this point further, consider Figure 11-10 below, which shows the coverage provided
by a low-gain antenna (LGA):
458
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
STA 3
Antenna Coverage
STA 1
AP
STA 2
STA 4
Distance (Range)
Fig. 11-10. Understanding Low Gain Antennas
Referencing Figure 11-10, the LGA is able to provide a broad coverage area, which includes STA 2,
STA 3, and STA 4. However, because it is low gain, the antenna’s coverage distance is limited and
STA 1 is not included in this range. If the gain were increased, then the antennas signal strength
and directivity would be increased, allowing it to reach greater distances as is illustrated in Figure
11-11 below:
STA 3
Antenna Coverage
STA 1
STA 2
STA 4
Distance (Range)
Fig. 11-11. Understanding High Gain Antennas
459
AP
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Referencing Figure 11-11, the LGA shown in Figure 11-10 has been replaced by a high-gain antenna (HGA), or the gain on the LGA was simply increased. Either way, the signal is stronger, more
direct, and traverses a greater (further) distance, which means that STA 1 now resides within the
coverage area. However, this comes at the expense of the area that is actually covered by the AP.
Previously, STA 3 was comfortably within the coverage area, while STA 1 was not; however, now
STA 3 is no longer within the coverage area, while STA 1 is. The recommended solution in this case
would be to reduce the adjusted gain on the AP and integrate another AP into the WLAN solution,
keeping in mind standard recommended configuration fundamentals, such as using overlapping
channels and ensuring that both APs are configured the same (e.g., SSIDs).
In summation, you should be aware of this tradeoff when deciding on antennas or adjusting the
gain values for antennas used in the WLAN solution. Careful consideration must be taken before
increasing or decreasing gain because this can result in issues such as path loss, which is the distance the signal can be transmitted, as well as a reduced coverage area.
CHAPTER SUMMARY
The following section is a summary of the major points you should be aware of in this chapter.
•
Wireless networks use radio waves to transmit data and connect devices
•
WLANs are meant to augment, not replace, wired LAN infrastructure
IEEE 802.11 Components
•
IEEE 802.11 components include the following:
1. Client or Station (STA)
2. Access Point (AP)
3. Independent Basic Service Set (IBSS)
4. Basic Service Set (BSS)
5. Extended Service Set (ESS)
6. Distribution System (DS)
•
The client is any appliance that interfaces with the wireless medium as an end user device
•
The wireless AP functions as a bridge between the STAs and the network backbone
•
An IBSS is a wireless network, consisting of at least two STAs
•
The BSS is a cellular architecture which divides the wireless network (system) into cells
•
The ESS is comprised of overlapping BSS sets (cells), usually connected by the DS
•
The Distribution System (DS) allows for the interconnection of the APs of multiple cells
460
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
IEEE 802.11 Frames
•
The 802.11 standard uses the following three types of frames:
1. Control Frames
2. Management Frames
3. Data Frames
•
•
•
The 802.11 standard uses control frames to control device access to the wireless medium
Management frames enable stations to establish and maintain communications
Data frames are sent by any STA and contain higher layer protocol information or data
IEEE 802.11 Standards
•
At the physical (PHY) layer, IEEE 802.11 defines a series of encoding and transmission schemes
for wireless communications the most common of which are the Frequency Hopping Spread
Spectrum (FHSS), Direct Sequence Spread Spectrum (DSSS), and Orthogonal Frequency Division Multiplexing (OFDM) transmission schemes. Although Infra Red (IR) also exists at this layer,
very little development of this standard has occurred due to line-of-sight limitations. The 802.11
standards described in this section are as follows:
1. IEEE 802.11 (original)
2. IEEE 802.11b
3. IEEE 802.11a
4. IEEE 802.11g
5. IEEE 802.11n
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The original IEEE 802.11 standard defined WLANs that provided up to 2 Mbps throughput
The original 802.11 standard is now considered obsolete because the throughput is too slow
802.11b is an extension to 802.11 that operates in the same unregulated 2.4 GHz band
Devices operating on 802.11b use DSSS modulation for higher speeds
The 2.4-GHz band consists of 14 channels, each 22 MHz wide
In North America, the FCC allows channels 1 through 11
Most of Europe can use channels 1 through 13. In Japan only channel 14 is used
IEEE 802.11a is an extension to 802.11 that provides up to 54 Mbps throughput
802.11a uses OFDM and does away with spread spectrum
802.11a is not compatible with 802.11b or 802.11g and therefore is seldom used
802.11a equipment operates at 5GHz
The 802.11g standard also works in the same 2.4GHz range as 802.11b
IEEE 802.11g operates at a bit rate as high as 54 Mbps, but uses the S-Band ISM and OFDM
802.11g is compatible with 802.11b can operate at the 802.11b bit rates and use DSSS
IEEE 802.11n improves on 802.11a and 802.11g maximum data rates
The 802.11n standard includes several enhancements to the other 802.11 standards
461
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
The Cisco WLAN Solu on
•
The Cisco Wireless LAN solution of WLCs and their associated LAPs
•
WLCs are responsible for system wide wireless LAN functions, such as the following:
1. Integrated Intrusion Prevention System (IPS)
2. Zero-Touch Deployment of Lightweight Access Points (LAPs)
3. Real-time Radio Frequency (RF) management
4. Wireless LAN Redundancy
5. Dynamic Channel Assignment for each LAP
6. Dynamic Client Load Balancing across LAPs
7. Dynamic LAP Transmit Power Optimization
8. Wireless LAN Security Management
•
WLCs communicate with Controller-based APs over any Layer 2 or Layer 3 infrastructure
•
WLCs communicate with LAPs using LWAPP or CAPWAP
•
LWAPP is an IETF draft protocol
•
CAPWAP, which is based on LWAPP, is a standard, interoperable protocol
•
The following sequence of events must occur in order for an LAP to register to a WLC:
1. The LAPs issue a DHCP Discovery Request to get an IP address (if no static address is used)
2. The LAP sends a Layer 2 LWAPP discovery request message to the WLC
3. If Layer 2 discovery fails or is not supported, then the Layer 3 discovery method is used
4. Any available WLC responds with an LWAPP Discovery Response
5. The LAP selects the WLC to join, which is typically the first WLC to respond to the LAP
6. The LAP then sends an LWAPP Join Request to the WLC
7. The WLC validates the LAP and then sends an LWAPP Join Response to the LAP
8. The LAP validates the WLC, which then completes the Discovery and Join process
9. The LAP registers with the WLC and can begin accepting client associations
•
The Cisco WLAN supports the following three types of roaming:
1. Intra-controller Roaming
2. Inter-controller Roaming
3. Inter-Subnet Roaming
Troubleshoo ng Cisco WLAN Solu ons
•
Generally speaking, Wireless LAN issues fall into one of the following problem areas:
1. Wireless Station (STA) or Client Issues
2. WLC Configuration Issues
3. AP Configuration Issues
4. AP and WLC Registration Issues
462
C H A P T E R 11: T RO U B L ES H O OT I N G C I S CO W I R E L ES S L A N S O LU T I O N S
5. Infrastructure Issues
6. Antenna and Radio Frequency Issues
•
Wireless client troubleshooting should include the following tasks:
1. Checking the client wireless NIC state
2. Checking client settings
3. Checking the state of the wireless client
•
Some common WLC device misconfigurations include the following:
1. Mismatched Service Set Identifiers
2. Security Mismatches
3. WLANs are disabled on the WLC
4. Data Rate Mismatches
5. Incorrect Client or Station Filtering
6. Unsupported Features
7. IP Address Assignment Issues
8. SSID Broadcast is Disabled
•
For APs operating in autonomous mode, ensure that the following parameters are the same:
1. The same Service Set Identifier
2. The same IP subnet
3. The same Layer 2 native VLAN
•
When using DHCP for AP address allocation, ensure that DHCP Option 43 is configured
•
When using Cisco IOS DHCP relay agent, forward UDP Broadcasts for port 12223
•
When using PoE, ensure that there is enough available power to power up all the APs
•
Excessive CRC and PLCP errors may point to one or more of the following issues:
1. Packet collisions due to densely populated clients
2. Overlapping channels
3. High multipath conditions due to bounced signals
4. Other signals in the 2.4GHz band
•
When troubleshooting Layer 1 and Layer 2 issues, check cabling for both APs and antennas
•
Common sources of antenna and RF problems include, but are not limited to, the following:
1. Radio Power Optimization
2. Radio Interference
3. Electromagnetic Interference
4. AP Channel Interference
463
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
5. Multipath
6. Antenna Power Issues
464
CHAPTER 12
Troubleshoo ng Cisco VoIP
and Video Solu ons
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
C
onverged networks are increasingly common. Present-day networks support integrated voice,
video, and data traffic. Carrying voice, video, and data traffic over a single transport infrastruc-
ture requires properly designed Quality of Service (QoS) implementation to ensure the required
level of service for all three traffic types. In addition to understanding how to configure Cisco Cata-
lyst switches to support these services, you are also expected to understand how to troubleshoot
switch configurations that may affect these services. The TSHOOT certification exam objectives
covered in this chapter are as follows:
•
Troubleshoot switch support of advanced services (i.e., Wireless, VOIP, and Video)
•
Troubleshoot a VoIP support solution
•
Troubleshoot a video support solution
Cisco IP Telephony solutions are an integral component of Cisco Unified Communications. The
Cisco Unified Communications solution allows for the integration of voice, video, data, and mobile applications on fixed and mobile networks. Cisco IP Telephony solutions are comprised of call
processing solutions, such as Cisco Unified Communications Manager and IP phones.
Cisco Unified Video Advantage enhances the existing Cisco IP Telephony solution by providing
video telephony functionality to certain Cisco Unified IP phones. The Cisco Unified Videoconferencing solution allows for a wide range of customized as well as fully converged voice, video, and
data solutions. This chapter will be divided into the following sections:
•
Cisco IP Telephony Fundamentals
•
The Need for LAN and WAN Quality of Service
•
LAN and WAN IPT QoS Implementation
•
Cisco IP Video Fundamentals
•
LAN and WAN Video QoS Implementation
•
Troubleshooting Converged Networks
CISCO IP TELEPHONY FUNDAMENTALS
IP Telephony, also referred to as Voice over IP (VoIP), is a generic term that is used to describe the
transport of traditional communications services, such as voice and fax, over the Internet Protocol
(IP). Some common elements found in a typical Cisco IP Telephony (IPT) solution include the
following:
•
One or more call agents
•
IP phones
466
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
•
Application servers
•
Voice gateways
•
Voice gatekeepers
Call agents, such as Cisco Unified Communications Manager (CUCM), are responsible for ordering and directing each step of call completion for the endpoints, which may include IP phones and
analog and digital ports on voice gateways. CUCM typically communicates with Cisco IP phones
using the Skinny Call Control Protocol (SCCP), sometimes simply referred to as Skinny. SCCP
is a proprietary network terminal control protocol that is built on a client/server model. Using
Skinny, end stations, which may be IP phones or even gateway ports, communicate with the call
agent (CUCM) and receive call setup and teardown instructions, among other things, from the
call agent.
In addition to using Skinny, the call agent can communicate with endpoints using the Media Gateway Control Protocol (MGCP). Like SCCP, MGCP also uses a client/server model where the call
agent is responsible for ordering and directing each step of call completion for the endpoints.
However, unlike Skinny, MGCP is an open standard that is supported by many different vendors.
In addition to controlling and directing endpoints, call agents also provide other services and
functions, which include bandwidth management, address translation, and Call Admission Control (CAC). CAC is used to protect the quality of voice call by preventing call completion if there
are not enough resources available to support the call(s). It is not the same thing as QoS, which is
used to protect real-time traffic, such as voice and video, from data traffic that is also contending
for the same network resources.
NOTE: You are not required to go into any further detail on SCCP or MGCP in the TSHOOT
certification exam. Additionally, while QoS is a core requirement of the TSHOOT exam, you
are not required to go into detail on Call Admission Control (CAC). SCCP, MGCP, and CAC
are not described in any additional detail in the remainder of this final chapter.
An IP phone is simply a telephony endpoint that allows telephone calls to be made over an IP network connection instead of a traditional analog connection. Cisco IP Telephony implementations
typically include one or more application servers that are used to provide additional services, such
as voice mail (Cisco Unity, Unity Connection, and Unity Express); unified messaging (Cisco Unity
Unified Messaging); call routing and comprehensive contact management capabilities (Cisco Unified Contact Center Enterprise and Cisco Unified Contact Center Express); and user availability
and communications capabilities information (Cisco Unified Presence).
467
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
Voice gateways are an integral part of the VoIP network. In their most simple form, gateways are
used to connect dissimilar networks, such as the IP network to the traditional Public Switched
Telephone Network (PSTN). In essence, the primary function of a gateway is to convert data traveling through it to a format that the other side understands. However, Cisco IOS voice gateways can
perform additional tasks and functions, such as translations, CAC, and Quality of Service (QoS),
among other things. The call agent can communicate with Cisco IOS voice gateways using MGCP,
SCCP, H.323, or the Session Initiation Protocol (SIP). As is the case with MGCP and SCCP, you
are not required to go into any detail on either H.323 or SIP in the current TSHOOT certification
exam.
Gatekeepers are optional H.323 IP Telephony components that, when used, provide advanced
services, such as address translation, CAC, bandwidth control, and zone management. A zone is
a collection of all endpoints that are managed by the gatekeeper. A simple common example of
an endpoint would be a voice gateway. Figure 12-1 below shows the components described in this
section and how they would be integrated into an IP Telephony solution:
Call Agent #1
Call Agent #2
Gatekeeper #1
Gatekeeper #2
Unity Server #1
Unity Server #2
Si
Si
Gateway #1
CISCO IP PHONE
7970 SERIES
1
2
ABC
4
GHI
5
JKL
3
1
?
DEF
2
ABC
6
4
MNO
GHI
-
7
5
JKL
V
Gateway #2
3
?
DEF
6
MNO
-
+
8
9
7
8
9
PQRS
TUV
WXYZ
PQRS
TUV
WXYZ
*
OPER
0
#
*
OPER
0
#
IP Phone
V
CISCO IP PHONE
7970 SERIES
+
IP Phone
PSTN
Fig. 12-1. A Basic IP Telephony Implementation
Figure 12-1 shows the different VoIP components described in the previous section. It should be
noted that this diagram is reflective of a single site. In the event that there are multiple sites, sometimes the voice gateways may also be used to connect to the WAN, allowing remote IP phones to
communicate with the local call agent. Alternatively, additional dedicated routers can also be used
to connect to the WAN, providing extra network fault tolerance.
468
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
Prior to implementing an IPT or VoIP solution, it is imperative that the network (i.e., both the
LAN and the WAN) be adequately prepared or configured to support this traffic. The sections that
follow describe some core requirements, which include the following:
•
Dynamic Host Configuration Protocol services
•
Network Time Protocol services
•
Trivial File Transfer Protocol services
•
Cisco Discovery Protocol
•
Voice or auxiliary VLAN
•
Power over Ethernet
•
LAN and WAN Quality of Service
Dynamic Host Configura on Protocol Services
Like most other endpoints, Cisco IP phones require the services of a Dynamic Host Configuration Protocol (DHCP) server to obtain the addressing information necessary for them to access
network services. However, in addition to basic addressing services, Cisco IP phones also need
DHCP servers to provide them with address information for the TFTP server from which they will
download their configuration file.
Within the DHCP server configuration, the IP address of the TFTP server should be configured
as either DHCP Option 150 (when specifying an IP address) or DHCP Option 66 (when specifying an FQDN). However, it should be noted that you can also specify an IP address when you use
DHCP Option 66. When configuring the DHCP pool for the IP phones, it is important to remember that these two options should never be configured together for the same pool. These options
are considered mutually exclusive (i.e., use one or the other). Using both of these options could
result in IP phones using the incorrect or less preferred TFTP server.
NOTE: If DHCP Option 66 is used and a name is specified, a Domain Name System (DNS)
server must be available to resolve the name of an IP address; otherwise, the IP phone will not
be able to contact the TFTP server to retrieve its configuration file. Because using Option 66
adds additional dependency on a DNS server, this method is seldom used; DHCP Option 150
is most commonly used.
The following configuration example illustrates how to configure a Cisco IOS DHCP server for IP
phones and specify the IP address of the TFTP server using DHCP Option 150:
R1(config)#ip dhcp pool DEFAULT-IPT-POOL
R1(dhcp-config)#network 10.0.0.0 /24
R1(dhcp-config)#default-router 10.0.0.1
R1(dhcp-config)#lease 8 0 0
469
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
R1(dhcp-config)#option 150 ip 10.0.0.252
R1(dhcp-config)#exit
The following configuration example illustrates how to configure a Cisco IOS DHCP server for IP
phones and specify the name of the TFTP server using DHCP Option 66. Because a name is specified, the DHCP pool is also configured with the IP addresses of two DNS servers. This will allow
the IP phones to communicate with the named TFTP server:
R1(config)#ip dhcp pool DEFAULT-IPT-POOL
R1(dhcp-config)#network 10.0.0.0 /24
R1(dhcp-config)#default-router 10.0.0.1
R1(dhcp-config)#lease 8 0 0
R1(dhcp-config)#option 66 ascii cucmpub.howtonetwork.net
R1(dhcp-config)#dns-server 172.16.1.253 172.16.1.254
R1(dhcp-config)#exit
NOTE: Additional information on Cisco IOS DHCP server and Cisco IOS DHCP relay agent
configuration can be found in the “Troubleshooting Cisco IOS DHCP and NAT” chapter in
this guide or within the “Branch Office and Teleworker Technologies” chapter in the ROUTE
guide, which is currently available online.
Network Time Protocol Services
Network Time Protocol (NTP) is an integral component of IPT implementation. NTP services are
applicable to both the Cisco IP phones and the CUCM. For Cisco IP phones, the NTP server can be
used to ensure that they display the correct time. However, this service may also be used for additional functions, such as restricting calls based on time and correct or accurate reporting, for example. This functionality is configured in the DHCP server, which provides this information to the IP
phones in conjunction with other addressing parameters. In Cisco IOS software, one or more NTP
servers can be specified for the DHCP pool using the option 42 <ip|ascii> [address|name]
DHCP pool configuration command.
Again, keep in mind that if a DNS name is provided, then the pool must also be configured with at
least one DNS server. The following example illustrates how to specify an NTP server for Cisco IP
phones when configuring a DHCP pool and using the Cisco IOS DHCP server feature:
R1(config)#ip dhcp pool DEFAULT-IPT-POOL
R1(dhcp-config)#network 10.0.0.0 /24
R1(dhcp-config)#default-router 10.0.0.1
R1(dhcp-config)#lease 8 0 0
R1(dhcp-config)#option 150 ip 10.0.0.252
R1(dhcp-config)#option 42 ip 172.16.1.1
R1(dhcp-config)#exit
470
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
In addition to Cisco IP phones, CUCM servers also use the services of an NTP server. During
installation, Cisco Unified CallManager prompts the administrator to specify the external NTP
server at publisher installation. If the server is not reachable, the installation fails. Additionally,
when installing additional servers as part of a cluster, if the first node is not synchronized with an
NTP server, the installation of the subsequent node will fail.
NOTE: The CUCM server can be configured to synchronize with a Cisco IOS router or switch
configured as an NTP server. Cisco IOS software NTP server configuration is described in the
“Network Monitoring and Maintenance” chapter in this guide.
TFTP Services
TFTP services are a critical component of the overall Cisco IP Telephony solution. Cisco IP phones
receive their configuration parameters from the specified TFTP server. In most instances, the
CUCM server is the TFTP server. However, Cisco IOS routers running Cisco Unified Communications Manager Express (CUCME) can also be configured as TFTP servers. The phone receives
TFTP server information via either DHCP Option 150 or Option 66, as described previously in the
DHCP Services section.
Cisco IOS TFTP server functionality is disabled by default but can be enabled using the tftpserver <location>:<filename> global configuration command. The following configuration
example illustrates how to configure a Cisco IOS software router running CUCME to function as
a TFTP server for IP phones. This configuration allows Cisco IP phones that use this device as the
TFTP server to download the correct firmware files:
R1(config)#$ flash:PHONE/7940-7960/P00308000500.bin alias P00308000500.bin
R1(config)#$h:PHONE/7940-7960/P00308000500.loads alias P00308000500.loads
R1(config)#$ flash:PHONE/7940-7960/P00308000500.sb2 alias P00308000500.sb2
R1(config)#$ flash:PHONE/7940-7960/P00308000500.sbn alias P00308000500.sbn
NOTE: The $ sign in the configuration above is simply due to the text being too long for a
single line and is therefore truncated. The configuration also assumes that the files specified
exist in the specified location (i.e., in Flash memory). Make sure that the relevant files do exist
prior to implementing such configurations.
Cisco Discovery Protocol
It is common (recommended) practice to disable Cisco Discovery Protocol (CDP) for security purposes, as it prevents any adjacent devices from gaining information about the router or switch.
However, CDP is an integral component of the Cisco IP Telephony solution. CDP is used to inform
Cisco IP phones of the voice or auxiliary VLAN. Additionally, CDP is also an integral component
471
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
of inline power. If you must disable CDP, do it on a per-port basis, ensuring that the switch ports
connected to Cisco IP phones have CDP enabled. If CDP is disabled on these ports, the IP phone
cannot exchange messages with the switch about the voice VLAN or power requirements.
Voice VLAN(s)
When connecting Cisco IP phones to the switch, it is recommended that the switch interface be
configured as a Multi-VLAN Access Port (MVAP), which allows the Cisco IP phone and connected
device (e.g., a laptop or workstation) to use different VLANs. This configuration will allow voice
traffic to receive priority over the data traffic sent by the connected device. While not mandatory,
the use of a separate VLAN for the voice traffic ensures that this delay-sensitive traffic is able to be
prioritized over normal user data traffic.
NOTE: When using an MVAP, from the perspective of the attached device, it is simply connected to a switch. In other words, the connected device (e.g., the workstation or laptop) is
completely unaware of the fact that it is actually connected to an IP phone.
The voice or auxiliary VLAN is configured in the same manner as any other VLAN. All normal
VLAN rules, such as Spanning Tree priorities and parameters, apply to this VLAN in the same
manner as they would for any standard data VLAN. On an MVAP, a native VLAN for data traffic
for the workstation connected to the IP phone is identified by the port VLAN identifier (PVID),
which is specified using the switchport access vlan <vlan> interface configuration command.
An auxiliary VLAN for voice service is identified by the voice VLAN identified (VVID), which is
specified using the switchport voice vlan <vlan> interface configuration command. Following
this configuration, the switch communicates the VVID to the IP phone using the CDP.
Frames sent by the PC connected to the Cisco IP phones will be sent in the native VLAN (PVID). These
frames will be untagged. Frames or packets sent by the Cisco IP phone will use the auxiliary VLAN
(VVID). These frames will include the 802.1Q tag. Within this tag, the User Priority field contains the
Quality of Service (QoS) information. QoS requirements for VoIP are described later in this section.
Power over Ethernet
Cisco IP phones can use an external power supply to draw their power or they can draw their
power from the switch to which they are connected, with the latter being the most commonly used
method. When power is drawn from the switch, the power is sent within the Ethernet cable connecting the switch and the IP phone. The following two methods provide inline power:
1. IEEE 802.3af-2003
2. Cisco inline power
472
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
The IEEE 802.3af-2003 Power over Ethernet (PoE) standard defines terminology to describe a port
that acts as a power source (PSE) to a powered device (PD); defines how a powered device is detected; and defines two methods of delivering PoE to the discovered PD. Cisco inline power (PoE)
is a proprietary approach. The IEEE 802.3af-2003 standard is actually based on this method of PoE,
which was available before PoE was standardized. Cisco has also extended power management
extensions using CDP negotiation to Cisco IEEE 802.3af-compliant devices to optimize PSE power
management further. Cisco Catalyst switches support both inline power (ILP) and IEEE 802.3af2003.
When selecting to use PoE, it is important to calculate adequately the amount of power that will
be required. You can use the Cisco Power Calculator available on the Cisco Web site to perform
such calculations. Incorrect power calculations can adversely impact IP Telephony implementation. When calculating total power requirements for the IP Telephony solution, you should also
factor in other solutions or devices that will draw power from the same switches. Such devices may
include wireless Access Points, for example.
You can use the show power suite of commands to verify PoE functionality. For example, you can
use the show power inline command to verify inline power status as follows:
Cat-6500-1#show power inline
Interface Admin Oper
Power(Watts)
Device
Class
From PS
To Device
--------- ------ ---------- ---------- ---------- ------------------- ----Gi1/1
Gi1/2
Gi1/3
Gi1/4
Gi1/5
Gi1/6
Gi1/7
Gi1/8
Gi1/9
auto
auto
auto
auto
auto
auto
auto
auto
auto
off
off
off
off
on
on
off
on
on
0
0
0
0
11.8
13.5
0
7.1
14.5
0
0
0
0
10.5
12.0
0
6.3
12.9
...
[Truncated Output]
473
n/a
Cisco
Cisco
n/a
Cisco
Cisco
n/a
Cisco
Cisco
n/a
AIR-LAP1252AG n/a
AIR-LAP1252AG n/a
n/a
IP Phone 7937 3
IP Phone 7945 3
n/a
IP Phone 7942 2
IP Phone 7961 3
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
THE NEED FOR LAN AND WAN QUALITY OF SERVICE
Quality of Service (QoS) is a critical component of any IP Telephony solution. Converged networks
must provide secure, predictable, measurable, and sometimes guaranteed services. In order to ensure successful end-to-end business solutions, QoS is required to manage network resources. Several reasons why QoS is required when integrating real-time, delay-sensitive traffic, such as voice
and video, into the network include the following:
•
Delay issues
•
Bandwidth issues
•
Jitter issues
•
Packet loss issues
All packets in a network experience some kind of delay from the time the packet is first sent to
when it arrives at its intended destination. This total delay, from start to finish, is referred to as
latency. There are several types of delay that may be experienced by packets or frames. Some common causes of delay include, but are not limited to, the following:
•
Serialization delay
•
Queuing delay
•
Processing delay
•
Forwarding delay
Serialization delay refers to the amount of time that it takes to send bits serially (i.e., one bit at a
time) across the wire. Queuing delay is the delay experienced when packets wait for other packets
to be sent. Processing delay is the time taken by the digital signal processor (DSP) to compress a
block of pulse-code modulation (PCM) samples. This is also referred to as Coder delay. Finally,
forwarding delay includes the processing time from when a frame and when the packet has been
placed in the output queue.
NOTE: You are not required to go into any detail on DSPs or PCM sampling in the TSHOOT
certification exam. These will not be described in any further detail in the remainder of this
chapter.
While there are numerous types of delay, all delay types fall into one of two categories: fixed delay
and variable delay. Fixed delay components add directly to the overall delay on the connection.
Examples of fixed delay include serialization delay, processing delay, and packetization delay. Packetization delay is the time taken to fill a packet payload with encoded or compressed speech.
474
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
Variable delays, on the other hand, arise from queuing delays in the egress trunk buffers on the
serial port connected to the WAN. These buffers create variable delays, called jitter, across the
network. Variable delays are handled through the de-jitter buffer at the receiving router or gateway.
Jitter is described in additional detail later in this section.
Generally speaking, bandwidth refers to the number of bits per second (bps) that are expected to
be delivered successfully across some medium. Based on this definition, bandwidth is equal to the
physical link speed or clock rate of the interface. In switching terms, however, the term bandwidth
refers to the capacity of the switch fabric. Therefore, the bandwidth considerations for WAN connections, for example, are not necessarily the same for LAN connections.
Jitter is the variation in delay between consecutive packets and is caused by variable queuing delays.
For this reason, jitter is often commonly referred to as variation delay. While such variations may be
acceptable for applications and data traffic, they can severely impact isochronous traffic, such as digitized voice, which requires that packets are transmitted in a consistent, uniform manner. The varying
arrival time of the packets can cause gaps in the recreation and playback of the original voice signal.
This is both undesirable and annoying to the listener. Jitter can be mitigated using de-jitter buffers.
Packet loss occurs when one or more packets traversing the network fail to reach their intended
destination. This may occur for several reasons, such as bit errors, lack of space in queues, and,
most commonly, congestion. While this does not generally affect connection-oriented protocols,
such as TCP, packet loss can cause major issues for real-time traffic, such as voice and streaming
video traffic. Packet loss can be mitigated using congestion management and/or congestion avoidance mechanisms. These are described later in this chapter.
LAN AND WAN IPT QOS IMPLEMENTATION
In order to understand QoS implementation, it is important to have an understanding of the three
different QoS models and how they are applicable when designing and implementing a QoS solution. The three Quality of Service models are as follows:
1. Best Effort Delivery (Default)
2. Integrated Services
3. Differentiated Services
As the name implies, the Best-Effort (BE) delivery model does not guarantee any level of service
and instead internetwork devices simply make their ‘best effort’ to deliver packets as quickly as
possible. The best-effort delivery model scales well but provides no difference in service for differ-
475
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
ent traffic classes. In other words, when this model is used (which is the default), voice, video, and
data traffic are all treated as one and the same. This model requires no QoS implementation within
the internetwork and is not recommended.
Integrated Services (IntServ) performs admission control for each flow request. IntServ provides
a way to deliver end-to-end QoS for real-time applications by explicitly managing network resources to provide QoS to specific user packet streams (flows). RFC 1633 defines two components
to provide guarantees per flow: resource reservation and admission control. IntServ uses Resource
Reservation Protocol (RSVP) to signal the internetworking devices about how much bandwidth
and delay a particular flow requires. Intelligent queuing mechanisms can be used with RSVP to
provide the following two kinds of services:
•
Guaranteed rate service
•
Controlled load service
Guaranteed rate service allows applications to reserve bandwidth to meet their specific requirements. In Cisco IOS software, the guaranteed service rate can be provided using Weighted Fair
Queuing (WFQ) in conjunction with RSVP. Controlled load service allows applications to have low
delay and high throughput, even during times of congestion. This service can be provided using
RSVP and Weighted Random Early Detection (WRED). Finally, admission control is used to decide
when a reservation request should be rejected.
NOTE: WFQ and WRED are described in additional detail later in this chapter.
The primary issue with IntServ is that is scales very poorly, especially when many sources are attempting to reserve end-to-end bandwidth for each of their particular flows.
Unlike IntServ, the Differentiated Services (DiffServ) model requires no advance reservations and
therefore scales very well. DiffServ defines the concept of service classes. DiffServ also allows each
internetwork device to handle these packets on an individual (per hop) basis. This is referred to as
per-hop behavior (PHB). DiffServs are applicable to Layer 3. Layer 2 frames use Class of Service
(CoS) bits that are contained within the 802.1Q or ISL-encapsulated frame.
In addition to IP Precedence (IP Prec), DiffServ defines a new Differentiated Services Code Point
(DSCP) field in the IP packet header by redefining the Type of Service (ToS) byte and creating a
replacement for the IP Precedence field with a new 6-bit field called the Differentiated Services
(DS) field. The last 2 bits of the ToS byte can now be used to perform flow control, and are referred
to as the Explicit Congestion Notification (ECN) bits.
476
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
DiffServ defines three sets of PHBs: Class Selector (CS), Assured Forwarding (AF), and Expedited
Forwarding (EF). The Class Selectors are DSCP values that are compatible with IP Prec values. The
Assured Forwarding (AF) PHB set is used for two functions: queuing and congestion avoidance.
Queuing places the packets into the different software queues based on the QoS labels. The last
PHB set is the Expedited Forwarding (EF) set. This uses a single DSCP value (EF) to represent it.
EF packets are given premium service (above all other classes). This is the default value assigned to
voice media packets in Cisco IP Telephony solutions.
NOTE: Additional detailed information on the three QoS models described in this section can
be found in the current SWITCH guide, which is available online at www.howtonetwork.net.
LAN QoS Solu ons for IP Telephony Implementa ons
The primary purpose of LAN QoS is buffer management. Switches require buffering to avoid buffer
overflows. Buffer overflows occur when multiple ingress ports are contending for the same egress
port. Catalyst switch QoS is primarily based on the Layer 2 markings that are contained within a
frame (i.e., the CoS value). However, it can also be based on the Layer 3 markings contained within a
packet (i.e., the IP Precedence or DSCP values). IP Precedence and DSCP were described in the previous section. The CoS value is contained in the VLAN field for 802.1Q and ISL-encapsulated frames.
On Cisco Catalyst switches, the configuration of a VVID configures the switch to send CDP packets to the Cisco IP phone, instructing the phone to send voice traffic in 802.1Q frames, tagged with
the specified VVID and a Layer 2 CoS value of 5. However, it should be noted that simply configuring a voice VLAN alone does not means that this automatically occurs. Instead, QoS must be
enabled manually on the switch, and the switch port(s) must be configured to trust incoming QoS
markings.
In Cisco IOS Catalyst switches, QoS is enabled globally using the mls qos global configuration
command. Following this, the mls qos trust interface configuration command must be used to
configure the port to trust incoming CoS or DSCP markings. Alternatively, the switch port can simply be configured to trust frames/packets received from the attached Cisco IP phone. In addition to
trusting packets/frames received from the phone, Cisco IOS software also allows administrators to
trust or re-mark frames/packets received from the device that is also attached to the IP phone using
the switchport priority extend suite of commands.
LAN QoS solutions for IP Telephony integration are typically performed in the ingress direction
when configuring Catalyst switches. Ingress QoS mechanisms are applied to frames and packets
received by the switch in the inbound direction. Catalyst switches support the following ingress
QoS mechanisms:
477
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
•
Traffic classification
•
Traffic policing
•
Marking
•
Congestion management and avoidance
Classification is used to differentiate one stream of traffic from another so that different service
levels can be applied to different streams of traffic. Frames can be classified based on the incoming
CoS, DSCP, or even Access Control List (ACL) configuration. When the switch receives a frame or
packet with an already existing QoS value, it must decide whether to trust the received QoS value.
This is determined using the port trust setting. Trust settings are configured at the trust boundary,
which is the perimeter of the network, such as the access port to which a workstation or IP Phone
is connected. The traffic that is received from beyond the perimeter is considered untrusted, unless it is explicitly trusted using the switchport priority extend trust interface configuration command on the switch. The trust boundary itself is configured using the mls qos trust
[cos|device cisco-phone|dscp|ip-precedence] interface configuration command.
Policing is a process that is used to limit traffic to a prescribed rate. Policing is used to compare the
ingress traffic rate to a configured policer. The policer is configured with a rate and a burst. The rate
defines the amount of traffic that is sent per given interval. When that specified amount has been
sent, no more traffic is sent for that given interval. The burst defines the amount of traffic that can
be held in readiness for being sent. Traffic in excess of the burst can be either dropped or have its
priority setting reduced.
Traffic that conforms to the policing configuration is considered in profile and will be forward, as
configured, by the switch. However, traffic that does not conform to the policing configuration is
considered out of profile. Out of profile traffic can either be dropped, or marked down (i.e., remarked), with a lower QoS value.
Marking involves setting QoS bits inside the Layer 2 or Layer 3 headers, which allows the other internetwork devices to classify based on the marked values. Marking is typically used in conjunction
with traffic policing. For example, if the traffic is in profile, the switch will typically allow the packets to be passed through (i.e., will not change or reset the QoS settings in the packets). However,
if the traffic is out of profile, the switch may be configured to mark down this traffic with a lower
QoS value. However, marking can also be used in conjunction with classification. For example,
you could issue the switchport priority extend cos 3 command to mark all frames/packets
received from the device connected to the IP phone LAN port with a CoS value of 3, effectively
classifying this traffic.
478
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
Congestion management and avoidance is comprised of three elements, which are queuing, dropping, and scheduling. Queuing is used to place packets into different software queues based on the
QoS labels. After the traffic is classified and marked with QoS labels, it is assigned into different
queues based on the QoS labels.
Once the packets have been placed into the appropriate queue based on their QoS values, dropping is used to manage queues. Dropping provides drop priorities for different classes of traffic. Queues have drop thresholds that are used to indicate which packets can be dropped once
the queue has filled beyond a certain threshold. After ingress packets are placed into the queue, a
congestion avoidance mechanism will use a CoS-to-threshold map to determine what frames are
eligible to be dropped when a threshold is breached. This prevents the queues from filling up.
Catalyst switches typically have two ingress queues, one of which either is a priority queue or can
be configured as a priority queue. The ingress frames and packets received by the switch are placed
in a queue based on the ingress (received) CoS value. Voice traffic, for example, that is received with
CoS 5 or DSCP EF will be placed into the priority queue, while regular data traffic will be placed
into the normal queue.
Scheduling refers to how the queues are serviced or emptied. If a priority queue is configured, it
only makes sense that this be serviced (emptied) before the normal queue. In other words, the
packets in the priority queue should be sent before the packets in the normal queue. Catalyst
switches use Strict Round Robin (SRR) for ingress scheduling. SRR is beyond the scope of the
TSHOOT certification exam and will not be described in any further detail in this chapter.
WAN QoS Solu ons for IP Telephony Implementa ons
The primary purpose of WAN QoS mechanisms is to make better use of bandwidth. However, in
converged networks, consideration should also be given to protecting voice traffic and ensuring
that voice quality is not impacted by other traffic types. Voice quality problems are one of the most
common issues experienced in converged networks. These problems are typically attributed to
packet drops, queuing delays, and link congestion. While implementing QoS will not guarantee
that there will never be any VoIP quality issues, it does allow you to optimize voice quality in networks by protecting this type of traffic. In Cisco IOS software, routers can be configured to use
strict priority queuing or Weighted Random Early Detection (WRED) mechanisms when implementing VoIP QoS on the WAN.
Strict priority queuing can be accomplished with Class-Based Weighted Fair Queuing (CBWFQ)
by using the IP RTP Priority, Frame Relay IP RTP Priority, or Low Latency Queuing (LLQ) features. The IP RTP Priority feature provides a strict priority queuing scheme for delay-sensitive
479
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
data, such as voice. Using this method, voice traffic is identified by its Real-Time Transport Protocol (RTP) port numbers and classified into a priority queue. The result is that voice is serviced as
strict priority in preference to other non-voice traffic types.
The Cisco IOS Frame Relay IP RTP Priority feature performs a similar function to the IP RTP Priority feature but is specific to Frame Relay. This feature provides a strict priority queuing scheme
on a Frame Relay Permanent Virtual Circuit/Data Link Connection Identifier (PVC/DLCI) and is
used in conjunction with Frame Relay map classes.
Low Latency Queuing (LLQ) provides strict priority queuing in conjunction with CBWFQ. LLQ
configures the priority status for a class within CBWFQ, in which voice packets receive priority
over all other traffic. This strict priority mechanism, like the others also described, reduces jitter
in voice conversations, improving overall voice quality. It should be noted that of the strict priority
mechanisms described, the Cisco Enterprise Solutions Engineering (ESE) group considers Low
Latency Queuing (LLQ) not only a recommendation but also best practice when implementing
voice QoS solutions on the WAN.
Weighted Random Early Detection is a queue management mechanism that provides differentiated performance characteristics for different classes of service. This allows for the preferential
handling of voice traffic under congestion conditions. WRED drops lower-priority traffic more aggressively than higher-priority traffic as the interface’s output queue begins to become congested.
In Cisco IOS software, WRED can be implemented in conjunction with Explicit Congestion Notification (ECN). The ECN bits are the last 2 bits of the ToS byte. These can now be used to perform
flow control by allowing routers and end hosts to use this marking as a signal that the network is
congested in order to slow down the sending of packets.
Configuring WAN QoS in Cisco IOS So ware Routers
While LAN QoS configuration is described and illustrated in detail in the SWITCH study guide,
WAN QoS configuration has yet to be described or illustrated. While you are not required to go
into detail on any advanced WAN QoS configurations, you should have a basic understanding of
how to configure and verify the mechanisms described in the previous section.
In Cisco IOS software, the IP RTP Priority feature is configured under the WAN interface using
the ip rtp priority <lower bound UDP port number> <port range> <bandwidth> interface configuration command. When using this command, you can prioritize and allocate bandwidth to RTP UDP port numbers for voice, video, and whiteboard traffic using the ranges listed in
Table 12-1 below:
480
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
Table 12-1. RTP UDP Port Ranges
Application
Voice
Whiteboard
Video
Starting RTP UDP Port Number
16,384
32,768
49,152
Ending RTP UDP Port Number
32,767
49,151
65,535
The following configuration example illustrates how to configure the IP RTP Priority feature for
the entire voice RTP UDP port range while specifying a total maximum bandwidth of 512Kbps for
this type of traffic. The configuration is applied under a WAN interface:
R1(config)#interface Serial0/0
R1(config-if)#ip rtp priority 16384 16383 512
R1(config-if)#exit
This configuration can be validated by viewing the router configuration. The Frame Relay IP RTP
Priority feature is specified in a Frame Relay map class, which is then applied to one or more PVCs/
DLCIs. The following configuration example illustrates how to configure a Frame Relay map class,
enable the Frame Relay IP RTP Priority feature, and then apply the configuration to Frame Relay
DLCI 100, which is configured under the router Serial0/0 interface:
R2(config)#map-class frame-relay TSHOOT
R2(config-map-class)#frame-relay ip rtp priority 16384 16383 512
R2(config-map-class)#exit
R2(config)#interface Serial0/0
R2(config-if)#encapsulation frame-relay
R2(config-if)#frame-relay interface-dlci 100
R2(config-fr-dlci)#class TSHOOT
R2(config-fr-dlci)#exit
Again, this configuration can be validated by checking the router configuration. LLQ, as was stated
earlier in this chapter, is configured in conjunction with CBWFQ. CBWFQ is implemented using
the Cisco IOS Modular QoS CLI (MQC). MQC configuration in Cisco IOS software requires
three steps to be performed. These steps are listed and described below:
1. Define a class map, which is used to identify the ‘interesting traffic,’ using the class-map
[match-any|match-all] <name> global configuration command. In class-map config-
uration mode, match packets against an ACL, protocol, or other QoS parameters, such
as DSCP values, for example. To ensure successful matches, it is important that traffic is
marked and classified at the network edge. The match-any and match-all keywords allow
you to specify whether to match any one of the parameters matched in the class map (if
more than one is specified) or match all parameters in the class map. Using the match-all
keyword means that all specified conditions must be met for an actual match to be made.
481
C I S CO C C N P T S H O OT S I M P L I F I E DͷPA RT I
2. Define a policy map, which is used to determine what to do with the different class maps
once the traffic has been matched. The policy map can be used to implement LLQ (strict
priority queuing), WRED, and other actions, such as re-marking packets before they are
transmitted across the WAN. The policy map is configured using the policy-map <name>
global configuration command. In policy-map configuration mode, explicitly configured
class maps are matched using the class <class-map-name> command. The default class,
which is used for all other traffic types that are not included in the explicitly configured class
maps, is configured using the class class-default policy-map configuration command.
3. Apply the policy map to interfaces, subinterfaces, PVCs, or DLCIs using the servicepolicy output <policy-map-name> configuration command. Although the policy-
map can also be applied in the inbound direction, this configuration is beyond the scope
of the TSHOOT certification exam and will not be discussed in this chapter.
The following configuration example illustrates how to implement strict priority queuing (LLQ)
in conjunction with CBWFQ on a router. The router is configured to match VoIP media traffic
(RTP) and assign it to the priority queue using LLQ. This traffic is allocated 128K of the available
bandwidth. This traffic is assigned a DSCP value of EF prior to being transmitted across the WAN.
The router is also configured to match video traffic (RTP) and assigns this traffic at least 256K of
the available bandwidth. This traffic is assigned a DSCP value of AF 41 prior to being transmitted
across the WAN. All other traffic (i.e., the default class) is assigned a default DSCP value of 0 prior
to being transmitted across the WAN via Serial0/0:
R1(config)#class-map match-any TSHOOT-VOICE
R1(config-cmap)#match protocol rtp audio
R1(config-cmap)#exit
R1(config)#class-map match-any TSHOOT-VIDEO
R1(config-cmap)#match protocol rtp video
R1(config-cmap)#exit
R1(config)#policy-map TSHOOT-MULTIMEDIA-QOS
R1(config-pmap)#class TSHOOT-VOICE
R1(config-pmap-c)#priority 128
R1(config-pmap-c)#set ip dscp ef
R1(config-pmap-c)#exit
R1(config-pmap)#class TSHOOT-VIDEO
R1(config-pmap-c)#bandwidth 256
R1(config-pmap-c)#set ip dscp af41
R1(config-pmap-c)#exit
R1(config-pmap)#class class-default
R1(config-pmap-c)#set ip dscp default
R1(config-pmap-c)#exit
R1(config-pmap)#exit
R1(config)#interface Serial0/0
R1(config-if)#service-policy output TSHOOT-MULTIMEDIA-QOS
R1(config-if)#exit
482
C H A P T E R 12: T RO U B L ES H O OT I N G C I S CO VO I P A N D V I D EO S O LU T I O N S
Following this configuration, you can use the show policy-map interface 
Download