network troubleshooting tools and techniques

NETWORK TROUBLESHOOTING
TOOLS AND TECHNIQUES
SESSION NMS-2001
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
1
Agenda
• Overview
• Troubleshooting Techniques
• Tools of the Trade
• Case Studies
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
2
Overview
• Troubleshooting complex networks
• Understand your network
• Gathering information
• Capturing changes
• Documenting problems
NMS-2001
9640_05_2004_c2
3
© 2004 Cisco Systems, Inc. All rights reserved.
Today’s Complex Networks
Leased
Lines
ATM
Frame Relay
Dial/ISDN
Headquarters
Branch Sites
Telecommuters
Home Office
Internet
IP-VPN
Internet
Mobile Users
IP-VPN
Remote Sites
Mobile Users
NMS-2001
9640_05_2004_c2
Partners/Customers
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
4
Troubleshooting Is a Two-Part Process
• Know and understand your network
• Be prepared when problems arise
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
5
Documenting the Network
• Network topology
• Software versions
• Device configurations
• Inventory attributes
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
6
Gathering Network Statistics
• Collect stats for errors
as well as utilization
• Look for trends and patterns
• Capture changes
• CPU, memory,
environmental data
• Know what protocols and
applications are running on
the network
• Throughput vs.
response times
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
7
So You Have a Problem
• Have your network baseline on hand
• Don’t panic
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
8
Gather All Information
• Listen to your users
“It’s taking forever to transfer this file”
“Is the server down?”
• Ask the right (open) questions
“What other files are transferring slowly?”
“What other servers have you tried?”
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
9
Capturing Changes
• Traps and syslog messages
• Changes in topology
• Configuration and software changes
• Hardware replacements
• Changes in network and device load
• Capturing historical data
• Looking for trends
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
10
Document the Problem
WHAT:
Which devices, links, interfaces, hosts,
applications have the problem?
WHERE:
Location—on what VLAN, subnet,
segment is the problem observed?
WHEN:
Timing—when did we first see the
problem? Is the problem reoccurring?
HOW MUCH: How big is the problem? Is the problem
getting worse?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
11
Document the Problem (Cont.)
• Also document where you are NOT seeing the
problem
• Look for differences between where you CAN and
where you CANNOT see the problem
• Look for changes that relate to those differences
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
12
Make Sure the Problem Cannot Reoccur
• How sure are we the problem cannot reoccur?
• What other devices need the same fix?
• Do we know the root cause of the problem?
• What can we do to prevent this problem from
reoccurring?
• What new problems could occur when we apply
this fix?
NMS-2001
9640_05_2004_c2
13
© 2004 Cisco Systems, Inc. All rights reserved.
Develop a Plan of Attack
Bottom-Up or Top-Down?
Application
Network/Transport
Data-Link
Physical
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
14
Document Your Actions
Switch 1
0
Workstation
A
Possible Cause of Problem
Notes
0
Workstation
No Connectivity to
NFS Server
1
Switch 1
Workstation Port Is Active,
VLAN Number Is Correct
Router A—Interface 0
Workstation Can Ping
Router A on the Local
Router A—Interface 1
Workstation Cannot Ping
Interface 1 on Router A
1
B
Switch 2
NFS Server
Don’t Introduce New Problems!
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
15
Work as a Team
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
16
Agenda
• Overview
• Troubleshooting Techniques
• Tools of the Trade
• Case Studies
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
17
Tools of the Trade
• General tools
• Cisco-specific tools
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
18
Tools of the Trade: General
• Ping
• Traceroute
• Pchar
• Netcat
• Host (NSLookup or Dig)
• Packet Sniffers
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
19
Ping
• Everywhere you go, there’s ping
• Check end-to-end network connectivity
• Baseline network layer performance
• Find data-dependent problems
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
20
Ping Example: Success
ICMP ECHO_REPLY
ICMP ECHO_REQUEST
PING rtp-cse-181.cisco.com (64.102.55.45): 56 data bytes
64 bytes from 64.102.55.45: icmp_seq=0 ttl=250 time=0.667 ms
--- rtp-cse-181.cisco.com ping statistics --1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.667/0.667/0.667/0.000 ms
NMS-2001
9640_05_2004_c2
21
© 2004 Cisco Systems, Inc. All rights reserved.
Ping Example: Packets Blocked
PING 205.218.121.150: 56 data bytes
ICMP 13 Unreachable from gateway 205.218.121.150
64.102.55.45
for icmp from rtp-cse-181.cisco.com (64.102.55.45) to
205.218.121.150
Denied!
205.218.121.150
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
22
ICMP Types and Codes
ICMP Type
ICMP Code
0—Echo Reply
0—None
3—Unreachable
0—Network Unreachable
1—Host Unreachable
2—Protocol Unreachable
3—Port Unreachable
4—Fragment Needed and DF Bit Set
5—Source Route Failed
6—Network Unknown
7—Host Unknown
8—Source Host Isolated
9—Communication with Destination Network Is Administratively Prohibited
10—Communication with Destination Host Is Administratively Prohibited
11—Bad Type of Service for Destination Network
12—Bad Type of Service for Destination Host
13—Administratively Blocked by Filter
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
23
Ping Options
Ping Option
OS Availability
Notes
Repeat
Count
UNIX,
Windows,
Cisco IOS®
Generate Extended Amounts of Network Traffic
Flood
Unix
Stress-Test Response Time or Network Connectivity
Generate Packets as Quickly as Possible
Get an Idea of How Many Packets Are Being Dropped
Due to Its Danger, Usually Only Available to Super-User
Data Pattern
UNIX,
Cisco IOS
Change the Data Pattern to Test for
Data-Dependent Problems Such As T1 Timing
or Line Code Problems
Packet Size
UNIX,
Windows,
Cisco IOS
Increase Packet Size to Help Identify
Data-Dependent Problems
Source
Interface
NMS-2001
9640_05_2004_c2
Unix,
Cisco IOS
Useful for Network-Layer Packet Generation
Verify Proper Routing
Test that Services Like NAT Are Working Correctly
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
24
Ping Example: T1 Line Code Violation
B8ZS
AMI
10.1.1.1
10.1.1.2
Router#ping
Protocol [ip]:
Target IP address: 10.1.1.2
Repeat count [5]: 100
Datagram size [100]: 1500
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]: 0x0000
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 100, 1500-byte ICMP Echos to 10.1.1.2, timeout is 2 seconds:
Packet has data pattern 0x0000
!!!!!!!!!!!!!!!!!!!!.U.U..U.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.U.U..!!!!!!!!!!!!!!!!!!!!!!!..
U.U.U..!!!!!!!!!
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
25
Ping Drawbacks
• Increases network load
• Uses artificially high TTL value
• Often routers lower the priority for ping
to prevent DoS attacks
• Only does network-layer checks
• Does not pinpoint network problems
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
26
Traceroute
• Uses IP TTL field to discover gateways
UDP probes sent to high ports
Elicits ICMP TIME_EXCEEDED from gateways
Elicits ICMP PORT_UNREACHABLE from destination
• Narrow down connectivity issues
• Baseline network layer performance on a
hop-by-hop basis
NMS-2001
9640_05_2004_c2
27
© 2004 Cisco Systems, Inc. All rights reserved.
Traceroute Example
PORT_UNREACHABLE
TIME_EXCEEDED
TIME_EXCEEDED
2
1
4
5
TIME_EXCEEDED
3
TIME_EXCEEDED
traceroute to nms-server2.cisco.com (172.18.124.33),
30 hop max, 40 byte packets
1
rtp5-gw1.cisco.com (64.102.55.2) 3.06 ms 0.533 ms 0.584 ms
2
rtp5-bb-gw1.cisco.com (10.81.254.73) 1.533 ms 0.393 ms 0.345 ms
3
rtp7-lab-gw1.cisco.com (10.81.254.66) 1.482 ms 0.55 ms 0.518 ms
4
172.18.127.134 (172.18.127.134) 5.224 ms 4.94 ms 4.427 ms
5
nms-server2.cisco.com (172.18.124.33) 4.865 ms 5.565 ms 5.049 ms
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
28
Traceroute Example:
Destination: Cisco
Cisco IOS®
traceroute to 10.29.4.1 (10.29.4.1), 30 hops max, 40 byte packets
1 nms-2511.rtp.cisco.com (10.29.100.2) 2.278 ms 2.088 ms 2.488 ms
2 nms-2610a.rtp.cisco.com (10.29.100.5) 3.363 ms 3.071 ms 3.153 ms
3 10.29.3.2 (10.29.3.2) 17.060 ms * 15.721 ms
Throttle
NMS-2001
9640_05_2004_c2
29
© 2004 Cisco Systems, Inc. All rights reserved.
Traceroute Example:
Routing Loop
D
rtrE
rtrC
rtrD
rtrC
workstation# traceroute 10.29.254.1
1 10.29.100.2 2 ms 2 ms 2 ms
2 10.29.2.1 5 ms 2 ms 3 ms
3 10.29.1.1 4 ms 1 ms 1 ms
4 10.1.3.30 46 ms 51 ms 49 ms
5 10.29.1.1 52 ms 46 ms 30 ms
6 10.1.3.30 12 ms 26 ms 7 ms
Loop!
E
C
B
A
Workstation
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
30
Traceroute Options
Traceroute
Option
OS Availability
Notes
Probe Port
Number
UNIX
Useful to Change if the Destination Host
Is Listening on the Default Probe Port
(Usually 33434)
Maximum
Number of
Hops
UNIX, Windows
Increase this if the Destination Host Is
Further Away than the Default of 30 Hops
Source
Interface/
Address
UNIX
NMS-2001
9640_05_2004_c2
If this Has to Go Above 64, there Is
Usually a Routing Problem
Verify that Routing Works from the
Given Address
Verify Services Like NAT Are Working
Correctly
© 2004 Cisco Systems, Inc. All rights reserved.
31
Traceroute Availability
• Available for most platforms
• Source code downloadable from http://ee.lbl.gov
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
32
Traceroute Drawbacks
• ICMP messages may be filtered
• Different IP stacks respond differently to traceroute
• Latency figures may not be accurate with regard
to applications
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
33
Pchar
• Based on pathchar
(path characterization tool by Van Jacobson)
• Measures network performance on a
per-hop and a total path basis
• Supports IPv4 and IPv6
• Useful in isolating performance problems
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
34
Pchar Example: Hop Statistics
128Kb ISDN
10Mb Ethernet
100Mb Ethernet
0: 10.29.100.33 (nms-server2.rtp.cisco.com)
Partial loss:
1 / 1472 (0%)
Partial char:
rtt = 1.872604 ms, (b = 0.000876 ms/B), r2 = 0.998560
stddev rtt = 0.004053, stddev b = 0.000005
Partial queuing: avg = 0.001308 ms (1492 bytes)
Hop char:
rtt = 1.872604 ms, bw = 9130.730297 Kbps
Hop queuing:
NMS-2001
9640_05_2004_c2
avg = 0.001308 ms (1492 bytes)
35
© 2004 Cisco Systems, Inc. All rights reserved.
Pchar Example: Path Statistics
128Kb ISDN
10Mb Ethernet
Path length:
Path char:
100Mb Ethernet
4 hops
rtt = 4.692853 ms r2 = 0.999996
Path bottleneck: 126.277240 Kbps
Path pipe:
Path queuing:
NMS-2001
9640_05_2004_c2
74 bytes
average = 0.003377 ms (1582 bytes)
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
36
Pchar Options
Pchar Flag Notes
-c
Ignore Routing Changes
Useful in Situations where Load-Balancing Is Used
Specify the Protocol that pchar Uses
-p
-T
This Can Be ipv4udp (Default), ipv4raw, ipv4icmp, ipv4tcp,
ipv6icmp or ipv6udp
Specify the Type of Service Bits
This Is Useful for Forcing a Particular DiffServ
Codepoint on Certain UDP Packets
Do SNMP Queries at Each Hop to Determine Each Router’s Idea
of what It Thinks the Next-hop Interface Characteristics Are
-S
NMS-2001
9640_05_2004_c2
This Option Requires the net-snmp Libraries from
ftp://net-snmp.sourceforge.net
© 2004 Cisco Systems, Inc. All rights reserved.
37
Pchar Drawbacks
• ICMP messages may be filtered
• Different IP stacks respond differently to pchar
• Latency figures may not be accurate with regard
to applications
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
38
Pchar Availability
• Pchar is not a Cisco IOS® command
• Pchar source code can be downloaded from:
http://www.employees.org/~bmah/Software/pchar
• Works on FreeBSD, NetBSD, OpenBSD, Linux,
Solaris, IRIX, Tru64, and AIX
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
39
Netcat
• Similar in operation to telnet
• Tests application connectivity
• Can test TCP and UDP services
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
40
Netcat Example:
Verify HTTP Connectivity
GET / HTTP/1.0
HTTP/1.1 200 OK
% nc –v –w 3 nms-server2.cisco.com 80
nms-server2.cisco.com [172.18.124.33] 80 (http) open
GET / HTTP/1.0
HTTP/1.1 200 OK
[HTML data]
NMS-2001
9640_05_2004_c2
41
© 2004 Cisco Systems, Inc. All rights reserved.
Netcat Example:
Verify Syslog Connectivity
AUTH.INFO: Hello World
Syslog Server
% echo ‘<38>Hello World’ | nc –w 1 –u nms-server2 514
% tail –1 /var/log/messages
Apr 17 00:54:45 nms-server2 Hello World
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
42
Netcat Options
Use “nc -v -w 3” for Verifying TCP Services
Pchar Flag
Notes
Change the Network Inactivity Timeout
-w
-u
Changing This to at Least 3 Is Useful when Checking Web or
Gopher Services
Tell Netcat to Use UDP Instead of TCP
Netcat Will Simulate a UDP “Connection”
Cause Netcat to Listen at a Given Port (as Specified with the -p Flag)
-l
This Option Is Useful for Creating Mock Services to Test Throughput
or Connectivity
Use with the -u Flag to Create a UDP Server
-p, -s
NMS-2001
9640_05_2004_c2
When Netcat Is Run with the -l Flag, Use the Specified Port and
IP Address Respectively
© 2004 Cisco Systems, Inc. All rights reserved.
43
Netcat Drawbacks
• Does not measure network performance
• Does not attempt to isolate where the connectivity
problem lies in the network
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
44
Netcat Availability
• Source code can be downloaded from:
ftp://coast.cs.purdue.edu/pub/tools/unix/netutils/netcat/
• Windows binary available from:
http://www.atstake.com/research/tools/index.html
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
45
Host (NSLookup or Dig)
• Used to query Domain Name Service for
IP addresses and hostnames
• Client-side DNS failures gives a false positive
for a connectivity problem
• Server-side DNS failures can cause sluggish
service connection times
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
46
Host Example:
Verify Name and Address Match
Query A nms-server2
Response A 172.18.123.33
% host nms-server2
nms-server2 has address 172.18.123.33
% host 172.18.123.33
33.123.18.172.IN-ADDR.ARPA domain
name pointer nms-server2.cisco.com
Queries Match
NMS-2001
9640_05_2004_c2
47
© 2004 Cisco Systems, Inc. All rights reserved.
Host Example:
Address and Name Do Not Match
SNMP Trap 10.1.1.1 Serial 3/0 Link Down
Serial 3/0
Internet
rtrA 10.1.1.1
nms1
Apr 25 11:37:04 INFO: Lab link on rtrB is down
Serial 3/0
Test Lab
rtrB 10.1.1.2
nms1> host rtrA
rtrA has address 10.1.1.1
NMS-2001
9640_05_2004_c2
nms1> host 10.1.1.1
1.1.1.10.IN-ADDR.ARPA domain name pointer rtrB
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
48
Host Options
Use the –t <querytype> Command Line Flag to Test
Different Kinds of Records
Query Type
A
The Host’s IP Address
CNAME
NMS-2001
9640_05_2004_c2
Type of Record Returned
The Canonical Name for an Alias
MX
The Mail Exchanger for the Given Domain
PTR
The Host Name if the Query Is an IP Address; Otherwise
the Pointer to Other Information
SOA
Start of Authority—the Actual Domain Name Server
that Hosts the Given Domain
© 2004 Cisco Systems, Inc. All rights reserved.
49
Packet Sniffers
• Analyze what’s really happening on the wire
• Good for measuring performance and connectivity
• Helpful for establishing network baselines
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
50
Packet Sniffers:
Setup in a Switched Network
Sniffer
• Configure sniffer
inline to monitor
trunk ports
Passive
Splitter
• Span a port,
multiple ports,
or a VLAN
Trunk
LAN Switch
Sniffer
SPAN
Port
Port 3/1
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
51
Packet Sniffers: Configuration
in a Switched Network
• Cisco Catalyst® OS
Usage: set span enable
set span disable
set span <src_mod/src_ports...> <dest_mod/dest_port>
[rx|tx|both] [inpkts <enable|disable>]
set span <src_vlan> <dest_mod/dest_port>
[rx|tx|both] [inpkts <enable|disable>]
• Cisco IOS
Switch(config)# monitor session <session number> source
<interface|vlan> <interface name>
Switch(config)# monitor session <session number> destination
interface <interface name>
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
52
Packet Sniffers: Snoop
Sun Solaris
snoop
# snoop jclarke-sun and tcp port 23
Using device /dev/hme (promiscuous mode)
rtp-cse-181.cisco.com -> rtp-dogwood.cisco.com TELNET C port=55083
rtp-dogwood.cisco.com -> rtp-cse-181.cisco.com TELNET R port=55083 Using
device /dev/hm
rtp-cse-181.cisco.com -> rtp-dogwood.cisco.com TELNET C port=55083
rtp-dogwood.cisco.com -> rtp-cse-181.cisco.com TELNET R port=55083 rtp-cse181.cisco.co
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
53
Packet Sniffers: tcpdump
# tcpdump host jclarke-sun and tcp port 23
tcpdump: listening on fxp0
12:21:37.298373 nms-server2.cisco.com.telnet > rtp-cse-181.cisco.com.51027: P
344418520:344418548(28) ack 2892313522 win 17520 (DF) [tos 0x10]
• Source code available from http://ee.lbl.gov
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
54
Packet Sniffers: Ethereal
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
55
Packet Sniffers: Ethereal
Follow TCP Stream
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
56
Packet Sniffers: Ethereal
• Reads traces from most commercial packet sniffers
• Reads snoop and tcpdump capture files
snoop -s 1518 -o outfile
tcpdump -s 1518 -w outfile
• Freely available from http://www.ethereal.com
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
57
URLs for Popular Tools
• Pchar—http://www.employees.org/~bmah/Software/pchar (source code)
• Netcat—ftp://coast.cs.purdue.edu/pub/tools/unix/netutils/netcat/
(source code), http://www.atstake.com/research/tools/index.html
(Windows binary)
• Tcpdump—http://ee.lbl.gov (source code)
• Ethereal—http://www.ethereal.com (source code and binaries)
• MRTG (Multi-Router Traffic Grapher)—useful for building baselines as
well as trending network performance over time—http://www.mrtg.org
• Cricket—similar to MRTG but offers more flexibility with polling and
reporting—http://cricket.sourceforge.net
• Smoke Ping—measures round-trip response time using a variety of
protocols; offers web-based reports similar to MRTG—
http://www.smokeping.org
• COSI (Cisco Open Source Initiative)—holds many useful tools for
troubleshooting and general network management—
http://cosi-nms.sourceforge.net
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
58
Tools of the Trade: Cisco
• Show commands
• Debugs
• Cisco Service Assurance Agent (SA Agent)
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
59
Show Commands
• show cam dynamic
• show cdp neighbor
• show ip route
• show ip traffic
• show ip cef
• show process cpu
• show process mem
• show system
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
60
show cam dynamic
* = Static Entry. + = Permanent Entry. # = System Entry. R = Router Entry.
X = Port Security Entry
VLAN
---18
6
100
18
100
6
19
Dest MAC/Route Des
-----------------00-10-0d-38-10-00
00-30-94-1c-46-ff
00-90-27-86-76-e2
00-00-0c-07-ac-12
00-04-de-a9-18-00
00-04-4e-f2-c8-00
00-10-0d-a1-18-80
[CoS]
-----
Destination Ports or VCs / [Protocol Type]
------------------------------------------5/3 [ALL]
5/3 [ALL]
5/1 [ALL]
5/3 [ALL]
5/3 [ALL]
5/3 [ALL]
5/3 [ALL]
• Cisco Catalyst OS command
• Shows MAC to port mapping
• The BRIDGE-MIB and Q-BRIDGE-MIB can be used to obtain
this information via SNMP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
61
Cisco Discovery Protocol
• Uses Layer 2 multicast for advertisements
• Uses special multicast MAC address so that Cisco
devices will not forward CDP packets
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
62
Cisco Discovery Protocol (Cont.)
• Runs on virtually all Cisco devices
• Enabled by default on all broadcast interfaces
• Displays information about directly connected
neighbors
• Useful for debugging connectivity issues as well as
building topology maps
• The CISCO-CDP-MIB can be used to obtain CDP
information via SNMP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
63
CDP Rules of Thumb
• Configure CDP only on links between Cisco devices
• Do not configure CDP on links you do not manage
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
64
show cdp neighbor
nms-2610a
nms-3640a
S1/0
S3/0
E0/0
3/5
nms-5000b
nms-3640a#show cdp neighbor
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater
Device ID
Local Intrfce
Holdtme
Capability
009042675(nms-500Eth 0/0
152
T B S
nms-2610a.rtp.cisSer 3/0
169
R
NMS-2001
9640_05_2004_c2
Platform
Port ID
WS-C5000
3/5
2610
Ser 1/0
65
© 2004 Cisco Systems, Inc. All rights reserved.
show cdp neighbor detail
nms-2610a
S1/0
nms-3640a
S3/0
E0/0
3/5
nms-5000b
ID: 009042675(nms-5000b)
Entry address(es):
IP address: 14.32.4.10
Platform: WS-C5000, Capabilities: Trans-Bridge Source-Route-Bridge Switch
Interface: Ethernet0/0, Port ID (outgoing port): 3/5
Holdtime : 174 sec
Version :
WS-C5000 Software, Version McpSW: 5.4(4) NmpSW: 5.4(4)
Copyright (c) 1995-2000 by Cisco Systems
advertisement version: 2
VTP Management Domain: 'nms-remote'
Native VLAN: 4
Duplex: full
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
66
show ip route
Gateway of last resort is 14.32.3.1 to network 0.0.0.0
14.0.0.0/8 is variably subnetted, 27 subnets, 4 masks
O
14.32.22.0/24 [110/110] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.5.16/30 [110/160] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.19.0/24 [110/137] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.18.0/24 [110/137] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.5.20/30 [110/135] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.5.24/30 [110/160] via 14.32.4.2, 12:09:27, Ethernet0/0
O E2 14.32.7.0/24 [110/20] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.5.2/32 [110/110] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.6.0/24 [110/136] via 14.32.4.2, 12:09:27, Ethernet0/0
O IA 14.32.5.3/32 [110/135] via 14.32.4.2, 12:09:27, Ethernet0/0
O E2 14.32.5.0/28 [110/40] via 14.32.4.2, 12:09:27, Ethernet0/0
• Verify all routers have a route to the destination
• Verify that the route taken is optimal
• The RFC1213-MIB can be used to obtain this information
via SNMP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
67
show ip traffic
IP statistics:
Rcvd: 8586090 total, 713345 local destination
0 format errors, 0 checksum errors, 1764 bad hop count
0 unknown protocol, 0 not a gateway
0 security failures, 0 bad options, 0 with options
Opts: 0 end, 0 nop, 0 basic security, 0 loose source route
0 timestamp, 0 extended security, 0 record route
0 stream ID, 0 strict source route, 0 alert, 0 cipso, 0 ump
0 other
Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
0 fragmented, 0 couldn't fragment
Bcast: 2260 received, 0 sent
Mcast: 351508 received, 87579 sent
Sent: 475639 generated, 7818883 forwarded
Drop: 60 encapsulation failed, 0 unresolved, 0 no adjacency
0 no route, 0 unicast RPF, 0 forced drop
0 options denied
Drop: 0 packets with source IP address zero
• Focus on error trends
• Identify types of errors; for example input errors usually
indicate faulty hardware
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
68
show ip cef
Prefix
0.0.0.0/0
0.0.0.0/32
14.32.1.0/24
14.32.2.0/24
14.32.3.0/24
14.32.4.0/24
14.32.5.0/28
14.32.5.0/32
Next Hop
14.32.5.33
14.32.5.1
receive
14.32.5.33
14.32.5.1
14.32.5.33
14.32.5.1
14.32.5.33
14.32.5.1
14.32.5.33
14.32.5.1
attached
receive
Interface
Serial5/0/0.2
Serial5/0/0.1
Serial5/0/0.2
Serial5/0/0.1
Serial5/0/0.2
Serial5/0/0.1
Serial5/0/0.2
Serial5/0/0.1
Serial5/0/0.2
Serial5/0/0.1
Serial5/0/0.1
• Verify next hops and interfaces are correct for given
route prefixes
• Corrupted CEF tables can cause strange routing behaviors
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
69
show process cpu
CPU utilization for five seconds: 29%/6%; one minute: 8%; five minutes: 5%
PID Runtime(ms) Invoked uSecs
5Sec
1Min
5Min TTY Process
1
880
1823462
0
0.00% 0.00% 0.00%
0 Load Meter
2
572128
2351401
243
0.00% 0.00% 0.00%
0 OSPF Hello
3
0
106
0
0.00% 0.00% 0.00%
0 RTR Scheduler
4
6118648
1117174
5476
0.00% 0.08% 0.10%
0 Check heaps
5
0
1
0
0.00% 0.00% 0.00%
0 Chunk Manager
6
0
2
0
0.00% 0.00% 0.00%
0 Pool Manager
7
0
2
0
0.00% 0.00% 0.00%
0 Timers
8
0
36
0
0.00% 0.00% 0.00%
0 Serial Backgroun
9
0
1
0
0.00% 0.00% 0.00%
0 OIR Handler
10
1832
112 16357
30.01% 10.03% 7.44% 18 Virtual Exec
• Check to make sure overall system load is under control
• Use the process list to determine which process might be
misbehaving
• The CISCO-PROCESS-MIB can be used to obtain this
information via SNMP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
70
show process mem
Total: 34074032, Used: 6937220, Free: 27136812
PID TTY Allocated
Freed
Holding
Getbufs
1
0
0
0
6868
0
2
0
188
188
3868
0
3
0
12120
2512624
10980
780
4
0
0
0
6868
0
5
0
1544
0
8412
0
6
0
9896152
6360264
6868
3092496
7
0
188
188
6868
0
8
0
222880
137936
6868
0
9
0
0
0
6868
0
10
0 1104690216 1544572760
29212
0
Retbufs
0
0
320
0
0
3772896
0
56000
0
0
Process
Chunk Manager
Load Meter
OSPF Hello
Check heaps
Chunk Manager
Pool Manager
Timers
Serial Backgroun
ALARM_TRIGGER_SC
SNMP ENGINE
• Check to make sure no process is hogging memory
• Use the process list to determine which process might be
misbehaving
• The CISCO-MEMORY-POOL-MIB can be used to obtain this
information via SNMP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
71
show system
PS1-Status PS2-Status
---------- ---------ok
none
Fan-Status Temp-Alarm Sys-Status Uptime d,h:m:s Logout
---------- ---------- ---------- -------------- --------ok
off
ok
7,07:46:16
20 min
PS1-Type
PS2-Type
-------------------- -------------------WS-CAC-1000W
none
Modem
Baud Traffic Peak Peak-Time
------- ----- ------- ---- ------------------------disable 9600
5%
7% Thu May 3 2001, 10:06:00
PS1 Capacity: 852.60 Watts (20.30 Amps @42V)
System Name
System Location
System Contact
CC
------------------------ ------------------------ ------------------------ --"NMS Pod"
• Cisco Catalyst OS command
• Shows environmental stats as well as peak and current
traffic load
• The CISCO-STACK-MIB can be used to obtain this
information via SNMP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
72
Debugs
• Debug ip packet detail
• Debug ip routing
NMS-2001
9640_05_2004_c2
73
© 2004 Cisco Systems, Inc. All rights reserved.
Debug IP Packet Detail
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
1w2d:
IP: s=172.18.124.189 (Serial5/0/0.2),
UDP src=47427, dst=161
IP: s=172.18.124.189 (Serial5/0/0.2),
UDP src=47427, dst=161
IP: s=172.18.124.189 (Serial5/0/0.2),
UDP src=47427, dst=161
IP: s=172.18.124.189 (Serial5/0/0.2),
UDP src=47427, dst=161
IP: s=172.18.124.189 (Serial5/0/0.2),
UDP src=47427, dst=161
IP: s=172.18.124.189 (Serial5/0/0.2),
UDP src=47427, dst=161
d=10.29.8.2, len 425, rcvd 4
d=10.29.8.2, len 416, rcvd 4
d=10.29.8.2, len 415, rcvd 4
d=10.29.8.2, len 417, rcvd 4
d=10.29.8.2, len 424, rcvd 4
d=10.29.8.2, len 424, rcvd 4
• Useful for verifying packet throughput when a sniffer
is not available
• Can crash a busy router if not used carefully!
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
74
Debug IP Packet Detail <acl>
router# debug ip packet detail ?
<1-199>
Access list
<1300-2699>
Access list (extended range)
<cr>
Use an Access-List to Limit the Output!
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
75
Debug IP Routing
15w0d: RT: add 14.32.6.0/24 via 14.32.3.1, ospf metric [110/792]
15w0d: RT: add 14.32.18.0/24 via 14.32.3.1, ospf metric [110/792]
15w0d: RT: add 14.32.19.0/24 via 14.32.3.1, ospf metric [110/793]
15w0d: RT: add 14.32.41.0/24 via 14.32.3.1, ospf metric [110/792]
15w0d: RT: add 14.32.42.0/24 via 14.32.3.1, ospf metric [110/792]
15w0d: RT: add 14.32.100.0/24 via 14.32.3.1, ospf metric [110/791]
• See when routes are added and deleted from the routing table
• Encompasses all routing protocols
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
76
Debug IP Routing <acl>
router# debug ip routing ?
<1-199>
Access list
<1300-2699>
Access list (extended range)
<cr>
• Use access-lists to limit the output and to focus on
specific routes
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
77
Cisco Service Assurance Agent (SA Agent)
• LAN/WAN troubleshooting
Measures hop-by-hop response time and availability
Evaluates thresholds and generates alarms
QoS aware
• Utilizes SA Agent embedded in Cisco IOS
No extra management hardware required
Leverage your existing Cisco routers
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
78
SA Agent Operations
Management
Station
Web Server,
File Server
or End User
M
SNMP Trap
ea
su
re
SAA
Measure
Cisco
Router
Router or Other
Network
Device
NMS-2001
9640_05_2004_c2
79
© 2004 Cisco Systems, Inc. All rights reserved.
SA Agent Example Use
• QoS
• Two hour histories
ToS settings
• Real time
ICMP, UDP Echo port
• Hop-by-hop path analysis
Jitter
• Trap notification
TCP Connect
HTTP
• Non QoS
Network Services
DNS
DHCP
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
80
How SA Agent Works
Cisco IOS
Router With SAA
Cisco IOS
Router With SAA
SAA
SAA
Network
Core
Troubleshoot Using IPM
• SA agent benefits
• Metrics measured
No extra management hardware
required
Response time
Proactive as well as reactive
Jitter
Baseline network performance
Packet loss
Availability
Service-aware transactions
closely match real-life service
performance
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
81
Internetwork Performance Monitor
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
82
URLs for Cisco Tools and Commands
• Using Cisco IOS ping to troubleshoot T1/56K lines—
http://www.cisco.com/warp/public/471/hard_loopback.html
• Configuring Cisco Discovery Protocol (CDP)—
http://www.cisco.com/univercd/cc/td/doc/product/software/ios121/121cgcr/fun_
c/fcprt3/fcd301c.htm
• Cisco Express Forwarding (CEF) commands—
http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/12cgcr/switc
h_r/xrcef.htm
• Important information on using debugging commands—
http://www.cisco.com/warp/public/793/access_dial/debug.html
• Using Cisco Service Assurance Agent (SA Agent)—
http://www.cisco.com/univercd/cc/td/doc/product/software/ios121/121cgcr/fun_
c/fcprt3/fcd301d.htm
• TAC Homepage—http://www.cisco.com/en/US/support/index.html
• TAC Troubleshooting tools—
http://www.cisco.com/kobayashi/support/tac/tools_trouble.shtml
(requires Cisco.com account)
• MIB Tools—http://www.cisco.com/go/mibs (requires Cisco.com account)
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
83
Agenda
• Overview
• Troubleshooting Techniques
• Tools of the Trade
• Case Studies
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
84
Case Study 1: Network Is Down
• Problem: Sam calls to tell you he can no longer
access the internet, read mails, use applications…
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
85
Know Your Topology
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
86
Ask the Right Questions: WHAT
• What other users are seeing the problem?
My colleague Tom cannot work either
• What users are NOT seeing the problem?
Cathy’s PC is working fine
NMS-2001
9640_05_2004_c2
87
© 2004 Cisco Systems, Inc. All rights reserved.
Document the Problem
Problem
Seen
What?
Sam
Problem
Not Seen
Differences
Changes
Cathy
Tom
Where?
When?
How Much?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
88
CiscoWorks: UserTracking
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
89
CiscoWorks: UserTracking
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
90
Document the Problem
Problem
Seen
Sam
What?
Problem
Not Seen
Differences
Changes
Cathy
Tom
Vlan124
Where?
Vlan999
When?
How Much?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
91
Gather Information: WHEN
• When did you first see this problem?
At 3:00 PM
• When did you see this problem in the past?
Never
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
92
Document the Problem
Problem
Seen
Sam
What?
Problem
Not Seen
Differences
Changes
Cathy
Tom
Where?
When?
Vlan124
Vlan999
Oct 7 15:00
Before that
How Much?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
93
Ask the Right Questions: How Much
• Since the problem started, when has the network
operated correctly?
Since the problem began, it never operated correctly
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
94
Document the Problem
Problem
Seen
Sam
What?
Problem
Not Seen
Differences
Changes
Cathy
Tom
Where?
When?
How Much?
NMS-2001
9640_05_2004_c2
Vlan124
Vlan999
Oct 7 15:00
Before that
100%
Sporadic
© 2004 Cisco Systems, Inc. All rights reserved.
95
Look for Changes: WHAT and WHERE
• What is different about users Sam and Tom
compared to user Cathy?
Vlan
Subnet
Spantree
• What is different about Vlan124 compared to
Vlan999?
VTP Domain
VTP Server
Spantree
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
96
Document the Problem
Problem
Seen
What?
Where?
When?
How Much?
NMS-2001
9640_05_2004_c2
Sam
Problem
Not Seen
Cathy
Tom
Vlan124
Differences
Changes
Vlan124
Subnet
Vlan999
VTPdomain
Spantree
Oct 7 15:00
Before that
100%
Sporadic
© 2004 Cisco Systems, Inc. All rights reserved.
97
Look for Changes: WHEN
• What is different about Oct 7 15:00 compared
to before that?
• What events took place in the network around
Oct 7 15:00?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
98
Changes Before Oct 7 15:00: Traps
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
99
Changes Before Oct 7 15:00: Syslog
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
100
Document the Problem
Problem
Seen
What?
Where?
When?
How Much?
NMS-2001
9640_05_2004_c2
Sam
Problem
Not Seen
Cathy
Tom
Vlan124
Differences
Changes
Vlan124
Subnet
Vlan999
VTPdomain
Spantree
Oct 7 15:00
Before that
?
?
100%
Sporadic
?
?
© 2004 Cisco Systems, Inc. All rights reserved.
101
Changes
• What has changed about Vlan124?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
102
Changes: Vlan124
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
103
Changes: Vlan124
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
104
Changes: Vlan124
NMS-2001
9640_05_2004_c2
105
© 2004 Cisco Systems, Inc. All rights reserved.
Changes: Vlan124
What Has Changed About Vlan124?
Before
Now
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
106
Document the Problem
Problem
Seen
Sam
What?
Problem
Not Seen
Cathy
Tom
Vlan124
Differences
Vlan124
Subnet
Vlan999
Where?
Changes
Spantree
Changed
VTPdomain
Spantree
Oct 7 15:00
Before that
?
100%
Sporadic
?
When?
How Much?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
107
Develop a Plan of Attack
• Based on information collected so far, we can
assume the root cause of this network outage is
a spanning tree loop
• We will confirm the immediate cause by further
analyzing the topology and network statistics, and
compare our findings to our known good baseline
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
108
Document Your Actions
Zob
Fa0/48
2/1
la-surf
Action
Notes
Analyze Port State Details
Comparing Port Details Shows
Physical Topology Has Not Changed
Analyze Port Statistics, and
Compare Current Data with
Known Good Data
Error Rate Has Increased
Tremendously on the Zob—
La-Surf Link
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
109
Confirm Immediate Cause
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
110
Confirm Immediate Cause
NMS-2001
9640_05_2004_c2
111
© 2004 Cisco Systems, Inc. All rights reserved.
Confirm Immediate Cause
Before
Now
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
112
Find the Root Cause
• What Caused this Spanning Tree Loop?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
113
Find the Root Cause
What Configuration Changes Were Made?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
114
Find the Root Cause
What Configuration Changes Were Made?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
115
Find the Root Cause
Zob> sh po 2/1
Port Name
Status
Vlan
Level Duplex Speed Type
----- ------------------ ---------- ---------- ------ ------ ----- -----------2/1
connected 124
normal
full
100 10/100BaseTX
Port AuxiliaryVlan AuxVlan-Status
InlinePowered
PowerAllocated
----- ------------- -------------- ----- ------ -------- ----- -------2/1 none
none
Port Security Violation Shutdown-Time Age-Time Max-Addr Trap
IfIndex
----- -------- --------- ------------- -------- -------- -------- ------2/1 disabled shutdown
0
0
1 enabled
14
Port Num-Addr Secure-Src-Addr
Age-Left Last-Src-Addr
Shutdown/Time-Left
----- ------------------------------- ----------------- -----------------Surf>sh
int fast0/48
2/1
0
FastEthernet0/48
is up,
line protocol
is up
Port Flooding
on
Address
Limit
Hardware is Fast Ethernet, address is 00d0.ba96.82b0 (bia
----- ------------------------00d0.ba96.82b0)
2/1
Description:Enabled
lab01
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not set
Half-duplex, Auto Speed (100), 100BaseTX/FX
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Queueing strategy: fifo
Output queue 0/40, 0 drops; input queue 0/75, 0 drops
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
116
Document Your Actions
Zob
Fa0/48
2/1
la-surf
Surf(config)#int Fa0/48
Surf(config-if)#duplex full
NMS-2001
9640_05_2004_c2
Action
Notes
Hard-Coded the Duplex of
Fa0/48 to Full on the la-surf
Switch
This Eliminated Port Errors
and Resets, and Stabilized
the Network
© 2004 Cisco Systems, Inc. All rights reserved.
117
Problem Solved!
• Now that the port duplex is consistent on both
sides of the link, the la-surf port is no longer
resetting due to excessive errors; the network has
been restored to its previous working state, and all
users have connectivity again
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
118
Case Study 2:
Troubleshooting Mobile IP
• Problem: Mobile nodes are taking a long time to get
registered with the home agent
NMS-2001
9640_05_2004_c2
119
© 2004 Cisco Systems, Inc. All rights reserved.
How It Should Work
MN
IRDP: Agent Advertisement:
Lifetime, Type, Services
MN
FA
IRDP: Agent Solicitation:
Lifetime, Services
HA
Registration
MN
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
120
Know Your Topology
Internet
MN
FA
HA
TACACS+ Server
• Mobile node registration comes in to the home agent via
the foreign agent
• The home agent checks the remote TACACS+ authentication
database to verify that the mobile node is allowed
to register
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
121
What Are the Symptoms? WHAT
• Mobile nodes retry multiple times before
successfully registering with home agent
• Sometimes the mobile node needs to be rebooted
before it will successfully register
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
122
Document the Problem
Problem
Seen
All Mobile
Users
What?
Problem
Not Seen
Differences
Changes
N/A
Where?
When?
How Much?
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
123
Ask the Right Questions:
WHERE and WHEN
• When does this happen?
When a large number of mobile nodes try to register at the
same time
• What other errors on the home agent when these
failures are occurring?
We see “insufficient resources” errors incrementing when
this problem occurs
• What is the CPU usage on the HA when this
problem is occurring?
CPU stays relatively low on the HA
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
124
Document the Problem
Problem
Seen
What?
Where?
N/A
N/A
HA
N/A
N/A
Changes
FA
Early Evening
How Much? Sporadic
NMS-2001
9640_05_2004_c2
Differences
All Mobile
Users
TACACS+
Server
Early Morning
When?
Problem
Not Seen
During the Day More Nodes
Registering
Late at Night
During Rush
Hours
Sporadic
© 2004 Cisco Systems, Inc. All rights reserved.
125
Develop a Plan of Attack
• Mobile node registrations timeout when a large
number of registrations take place at once
Assumption: Somewhere we are hitting a bottleneck
• “Insufficient resources” errors are incrementing
when failures occur
Assumption: This supports our bottleneck assumption
• CPU remains stable on the home agent
Assumption: The HA itself is not the bottleneck
• Next step: Analyze the paths between the HA
and the FA and between the HA and the
TACACS+ server
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
126
Document the Problem
Problem
Seen
What?
Where?
When?
Problem
Not Seen
All Mobile
Users
N/A
FA—HA
N/A
Changes
N/A
HA—TACACS+
TACACS+
Server
Early Morning
During the Day
Early Evening
Late at Night
How Much? Sporadic
NMS-2001
9640_05_2004_c2
Differences
More Nodes
Registering
During Rush
Hours
Sporadic
127
© 2004 Cisco Systems, Inc. All rights reserved.
Document Your Actions
100Mbit
40Mbit
Internet
MN
HA
FA
TACACS + Server
Action
Notes
Tested Link Between HA and FA
with Pchar
Pchar Reports Bottleneck of 100mbits
Tested Link Between HA and
TACACS+ Server with Pchar
Pchar Reports Bottleneck of 40mbit
Used IPM to Config a SAA
Operation on the HA to Test
Overall Application-Layer
Latency Between the HA and the
TACACS+ Server
We Need to Determine if the Sum of
Network-Layer Latency Plus
TACACS+ Application-layer Latency
Is Our Overall Bottleneck
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
128
Configuring an Operation
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
129
Configuring a Collector
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
130
Analyzing the Results
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
131
Determine the Immediate Cause
• Ran lab tests with the new latency data from
IPM/SAA
• Discovered that when many mobile nodes tried to
register at once, latency spiked to unacceptable
levels between the HA and the TACACS+ server
• This increase caused some mobile nodes to
timeout when trying to register with the
home agent
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
132
Determine the Root Cause
• Upgrade the path between the HA and the
TACACS+ server
• Upgrade the TACACS+ server’s hardware allowing
more mobile nodes to register simultaneously
without timeout
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
133
Problem Solved!
• Now that the hardware and network capacity have
been increased, mobile customers are registering
quickly, and timeouts have dropped to an
acceptable level
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
134
Summary and Tips
• Don’t panic!
• Understand your network
• Develop network baselines
• Gather the right information so you can
analyze all aspects of the problem
(WHAT, WHEN, WHERE, HOW MUCH)
• Work methodically and document all
your actions
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
135
Summary and Tips
• Learn the tools
• Figure out which tools and which options work
for each problem
• Use access-lists when enabling debug commands
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
136
Summary and Tips
• Develop network baselines
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
137
Other Network Management Sessions
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
NMS-1N01: Introduction to Network Management—Networkers On-Line
NMS-1N02: Introduction to SNMP and MIBs—Networkers Online
NMS-1N03: Accurate Time Synchronization—Networkers Online
NMS-1N04: Introduction to Service Assurance Agent—Networkers Online
NMS-1N41: Introduction to Performance Management—Networkers Online
NMS-1011: Principles of Fault Management
NMS-1101: Understanding DNS and DHCP
NMS-2021: Large Scale Deployments of CiscoWorks
NMS-2031: Traffic Accounting Scenarios
NMS-2032: NetFlow for Accounting, Analysis and Attack
NMS-2042: Performance Measurement with Cisco Devices
NMS-2051: Securely Managing Your Network
NMS-2102: Deploying and Troubleshooting NAT
NMS-2101: DNS Deployment and Operation
NMS-3011: Getting the Right Events from Network Elements
NMS-4012: MPLS Embedded Management Tools
NMS-4043: Advanced Service Assurance Agent
NMS-2T00: Network Management Best Practices—Techtorial
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
138
Recommended Reading
• Continue your
Networkers learning
experience with further
reading for this session
from Cisco Press.
• Check the
Recommended
Reading flyer for
suggested books.
Available on-site at the Cisco Company Store
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
139
Complete Your Online Session Evaluation!
WHAT:
Complete an online session evaluation
and your name will be entered into a
daily drawing
WHY:
Win fabulous prizes! Give us your feedback!
WHERE: Go to the Internet stations located
throughout the Convention Center
HOW:
NMS-2001
9640_05_2004_c2
Winners will be posted on the onsite
Networkers Website; four winners per day
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
140
NMS-2001
9640_05_2004_c2
© 2004 Cisco Systems, Inc. All rights reserved.
© 2004 Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
141