Ericsson SSR 8000 R15 System Troubleshooting STUDENT BOOK LZT1381712 R1A LZT1381712 R1A Ericsson SSR 8000 R15 System Troubleshooting DISCLAIMER This book is a training document and contains simplifications. Therefore, it must not be considered as a specification of the system. The contents of this document are subject to revision without notice due to ongoing progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document. This document is not intended to replace the technical documentation that was shipped with your system. Always refer to that technical documentation during operation and maintenance. © Ericsson AB 2015 This document was produced by Ericsson. • The book is to be used for training purposes only and it is strictly prohibited to copy, reproduce, disclose or distribute it in any manner without the express written consent from Ericsson. This Student Book, LZT1381712, R1A supports course number LZU1082262. -2 - © Ericsson AB 2015 LZT1381712 R1A Table of Contents Table of Contents 1 CLI TOOLS FOR TROUBLESHOOTING ........................................ 11 1 REVIEW FUNDAMENTAL CONCEPTS.......................................... 12 1.1 CONTEXT, INTERFACES, & BINDINGS ARCHITECTURE......... 13 1.2 TERMINOLOGY...........................................................................14 2 COMMAND LINE INTERFACE (CLI) STRUCTURE........................ 18 2.1 INTRODUCTION..........................................................................18 2.2 MANEUVERING THROUGH THE CLI ......................................... 20 3 MONITORING WITH CLI ................................................................23 3.1 CLI INTRODUCTION AND THE PROMPT STRUCTURE ............ 24 3.2 CONTEXT MONITORING ............................................................25 3.3 CLI HELP .....................................................................................25 3.4 CLI FOR THE FAST PEOPLE......................................................26 4 LAB ENVIRONMENT ......................................................................27 4.1 CONNECTING TO ERICSSON TRAINING LABS ........................ 28 5 CONFIGURE MANAGEMENT INTERFACE ................................... 29 5.1 REFERENCE FOR THIS MODULE ............................................. 29 5.2 CONFIGURE MANAGEMENT INTERFACE ................................ 30 5.3 VALIDATING THE CONFIGURATION ......................................... 33 5.4 BINDING INFORMATION ............................................................34 5.5 EXERCISE 1: MANAGEMENT CONFIGURATION ...................... 35 6 TROUBLESHOOTING PREPARATION COMMANDS & TOOLS.... 35 6.1 TROUBLESHOOTING PREPARATION ....................................... 35 6.2 REMOTE TERMINAL SESSION TIMEOUT ................................. 36 6.3 WHO IS LOGGED INTO THE SSR ..............................................36 6.4 COMMAND HISTORY .................................................................37 LZT1381712 R1A © Ericsson AB 2015 -3 - Ericsson SSR 8000 R15 System Troubleshooting 6.5 TROUBLESHOOTING BY SEARCHING ..................................... 38 6.6 COMMAND LINE INTERFACE & EMACS ................................... 39 6.7 COMMAND LINE INTERFACE & EMACS ................................... 40 6.8 GREP: GLOBAL REGULAR EXPRESSION PARSER ................. 40 6.9 EXTENDED GREP ......................................................................41 6.10 OTHER SEARCHING TOOLS ...................................................41 6.11 REGULAR EXPRESSIONS .......................................................42 6.12 REGULAR EXPRESSION: EXAMPLES WITH GREP ................ 44 7 ALIASES AND MACROS ................................................................45 7.1 INTRODUCTION TO ALIAS .........................................................45 7.2 INTRODUCTION TO MACRO......................................................46 7.3 VARIABLES IN MACROS ............................................................46 8 EXERCISE 2: INTRODUCTION, SEARCHING AND FILTERING ... 47 8.1 EXERCISE 2: SEARCHING AND FILTERING ............................. 47 8.2 EXERCISE 2, REVIEW (1-4)........................................................47 8.3 EXERCISE 2, REVIEW (2-4)........................................................48 8.4 EXERCISE 2, REVIEW (3-4)........................................................48 8.5 EXERCISE 2, REVIEW (4-4) (OPTIONAL) .................................. 49 9 CHAPTER SUMMARY ....................................................................50 2 OPERATIONAL HEALTH OF THE SSR SYSTEM .......................... 51 1 TROUBLESHOOTING PROCEDURE............................................. 52 1.1 SYSTEM HARDWARE HEALTH..................................................53 1.2 OVERVIEW: HARDWARE STATUS ............................................ 54 1.3 RETRIEVING HARDWARE DETAILS LINE CARDS .................... 56 1.4 RPSW HARDWARE INFORMATION ........................................... 57 1.5 ALSW HARDWARE INFORMATION ........................................... 58 1.6 FINDING HARDWARE ALARMS (1-2)......................................... 59 -4 - © Ericsson AB 2015 LZT1381712 R1A Table of Contents 1.7 FINDING HARDWARE ALARMS (2-2)......................................... 59 1.8 SYSTEM HARDWARE CHECKS .................................................60 1.9 SYSTEM ALARMS .......................................................................61 1.10 SYSTEM ALARM WITH OPTIONS ............................................ 61 1.11 EXAMPLE: INITIATING MAJOR SYSTEM ALARM .................... 62 1.12 EXAMPLE: INITIATING CRITICAL SYSTEM ALARM ................ 63 1.13 SYSTEM HARDWARE LED .......................................................64 1.14 CARD POWERED DOWN .........................................................65 1.15 SYSTEM STORAGE VERIFICATION ........................................ 66 1.16 SYSTEM STORAGE ..................................................................67 1.17 SYSTEM STORAGE VERIFICATION ........................................ 68 1.18 SYSTEM STORAGE VERIFICATION: EXAMPLE ...................... 68 2 CHAPTER SUMMARY ....................................................................70 3 FUNDAMENTAL CONCEPT OF PROCESSES ARCHITECTURE ON THE SYSTEM........................................... 71 1 PROCESS ARCHITECTURE ..........................................................72 1.1 RPSW PROCESSES ...................................................................73 1.2 PROCESS SCHEDULING ...........................................................76 1.3 RPSW PROCESSES VERIFICATION ......................................... 77 1.4 FINDING CPU INTENSIVE PROCESSES ................................... 78 1.5 SINGLE PROCESS VERIFICATION ............................................ 79 1.6 SINGLE PROCESS IN DETAIL....................................................80 1.7 SINGLE PROCESS VERIFICATION - ISM .................................. 81 1.8 SINGLE PROCESS VERIFICATION - OSPF ............................... 82 1.9 MAXIMUM CRASHES ALLOWED ............................................... 83 1.10 PROCESS CRASH ....................................................................84 1.11 SOFTWARE PROCESS FAILURE SCENARIO ......................... 85 LZT1381712 R1A © Ericsson AB 2015 -5 - Ericsson SSR 8000 R15 System Troubleshooting 1.12 SYSTEM STOPPED PROCESSES ........................................... 86 1.13 OLD CORE FILE ON RP............................................................88 1.14 CORE FILES – COPIED BETWEEN RP .................................... 88 1.15 CORE DUMP FILES ON STANDBY RP..................................... 89 2 EXERCISE 3: INTRODUCTION ......................................................90 2.1 EXERCISE 3: SYSTEM PROCESSES ........................................ 90 2.1.1 EXERCISE 3: REVIEW .............................................................90 3 CHAPTER SUMMARY ....................................................................92 4 UNDERSTAND THE SSR SYSTEM REDUNDANCY ISSUES ........ 93 1 RP REDUNDANCY .........................................................................94 1.1 RP REDUNDANCY DETAILS ......................................................95 1.2 INVESTIGATING REDUNDANCY ISSUES .................................. 96 1.3 SHOW SYSTEM REDUNDANCY ................................................ 96 2 ANALYZING PROBLEMS OF STANDBY RP.................................. 98 2.1 ACTIVE OR STANBY RP.............................................................99 2.2 CONNECTING TO STANDBY RP WITHOUT CONSOLE ............ 99 2.3 SEARCHING FOR RESTART REASON .................................... 100 2.4 REPEATING COMMANDS ON STANDBY RP .......................... 100 2.5 VERIFY PROCESSES ON STANDBY RP ................................. 101 2.6 COPY FILES FROM STANDBY RP ........................................... 101 3 RP FAILOVER MANAGEMENT .................................................... 103 3.1 MANAGING RELOADS AND RP SWITCH-OVER ..................... 103 3.2 MANUAL RP SWITCHOVER ..................................................... 104 4 CHAPTER SUMMARY .................................................................. 106 5 ISSUES RELATED WITH BOOT PROBLEM ................................ 107 1 BOOT PROBLEMS ....................................................................... 108 1.1 ENTERING BOOT ROM INTERFACE ....................................... 108 -6 - © Ericsson AB 2015 LZT1381712 R1A Table of Contents 1.2 EXAMPLE: ENTERING BOOT ROM INTERFACE..................... 109 1.3 DIAGNOSTICS COMMAND ....................................................... 109 1.4 RUNNING DIAGNOSTICS ......................................................... 110 2 TROUBLESHOOTING SCENARIOS ............................................ 111 2.1 TROUBLESHOOTING SCENARIOS ......................................... 112 2.2 SYSTEM UPTIME ...................................................................... 112 2.3 SYSTEM STORAGE VERIFICATION ........................................ 113 2.4 EXERCISE 4: INVESTIGATE BOOT PROBLEMS ..................... 113 3 CHAPTER SUMMARY .................................................................. 114 6 ACTIVE AND HISTORY LOGS IN SSR......................................... 115 1 SYSTEM LOGGING INTRODUCTION.......................................... 116 1.1 LOGGD PROCESS.................................................................... 117 1.2 SYSTEM LOG COMMANDS ...................................................... 118 1.3 EVENT SEVERITY LEVELS IN LOG MESSAGES .................... 119 1.4 LOGS FROM CARDS ................................................................ 119 1.5 SHOW LOG AND TIME ............................................................. 120 1.6 LOG FILES ................................................................................ 121 1.6.1 CUSTOM LOG FILES AND FILTERS ..................................... 123 1.6.2 LOG FILES LOCATION .......................................................... 124 1.6.3 DISPLAY LOG FILES.............................................................. 124 1.7 FILTER BASED ON FACILITY ................................................... 125 1.7.1 FILTER BASED ON FACILITY EXAMPLE .............................. 125 1.8 PM PROCESS LOGS ................................................................ 126 1.9 CSM PROCESS LOGS .............................................................. 127 1.10 ISM PROCESS ........................................................................ 127 1.11 FILTER BASED ON FACILITY ON CARD................................ 128 1.12 LOGGER VERIFICATION ........................................................ 129 LZT1381712 R1A © Ericsson AB 2015 -7 - Ericsson SSR 8000 R15 System Troubleshooting 1.13 SHOW LOGGING CARD INFORMATION................................ 130 1.14 LOGGING DISPLAY INFO ....................................................... 130 1.15 LOGGING DEBUG ................................................................... 132 1.16 LOGGING DEBUG ................................................................... 134 1.17 LOG FILE COLLECTION ......................................................... 134 2 SYSLOG CONFIGURATION ........................................................ 135 2.1 SYSLOG SERVER..................................................................... 136 2.2 EXERCISE 5: LOGGING & SYSLOG......................................... 136 2.2.1 EXERCISE REVIEW: CONFIGURE SYSLOG & DEBUG........ 137 2.2.2 EXERCISE REVIEW: SYSLOG SERVER ENVIRONMENT .... 137 2.2.3 EXERCISE REVIEW: SAVE AND DISPLAY THE LOGS ........ 138 3 CHAPTER SUMMARY .................................................................. 139 7 USE AND IMPACT OF DEBUGGING ON THE SSR SYSTEM ..... 141 1 DEBUG INTRODUCTION ............................................................. 142 1.1 DEBUG COVERAGE ................................................................. 144 1.2 HOW TO RECOGNIZE A DEBUG FUNCTION .......................... 145 1.3 DEBUGGING WITHIN CONTEXT LOCAL ................................. 146 1.4 DEBUGGING IN DIFFERENT CONTEXTS................................ 146 1.5 DEBUG RELATIONSHIP WITH CONTEXTS ............................. 148 1.6 SEND DEBUG OUTPUT TO SCREEN ...................................... 149 1.7 ADMINISTRATOR PRIVACY ..................................................... 151 1.8 DEBUGGING AND IMPACT ...................................................... 152 1.9 EXERCISE 6: DEBUGGING ON SSR ........................................ 152 2 CHAPTER SUMMARY .................................................................. 153 8 TROUBLESHOOTING FOR TRAFFIC FLOW THROUGH PORTS, CIRCUITS AND INTERFACES ................................... 155 1 TROUBLESHOOTING BASIC CHECKS ....................................... 156 -8 - © Ericsson AB 2015 LZT1381712 R1A Table of Contents 1.1 INTERFACE & PORT STATES .................................................. 157 1.2 VERIFYING INTERFACE STATUS ............................................ 159 1.3 IDENTIFYING INTERFACE PROBLEMS: UNBOUND STATE... 160 1.4 PORT STATUS: ADMIN STATE AND LINE STATE ................... 163 1.5 CIRCUIT STATUS ..................................................................... 164 2 TROUBLESHOOTING TRAFFIC .................................................. 165 2.1 TROUBLESHOOTING TRAFFIC PROBLEMS ........................... 165 2.2 PORT COUNTERS – OVERVIEW ............................................. 166 2.3 LIVE PORT COUNTERS ........................................................... 167 2.4 PORT COUNTERS .................................................................... 167 2.5 TROUBLESHOOTING CIRCUITS ............................................. 169 2.6 CIRCUIT COUNTERS................................................................ 169 2.7 VLAN CIRCUIT STATISTICS ..................................................... 170 2.8 CLEARING COUNTERS ............................................................ 171 2.9 IP TROUBLESHOOTING TOOL ................................................ 171 2.10 TRAFFIC TROUBLESHOOTING EXERCISE: INTRODUCTION ................................................................................172 2.11 TRAFFIC TROUBLESHOOTING EXERCISE: PREPARATION ..................................................................................172 2.12 EXERCISE 7: TRAFFIC TROUBLESHOOTING....................... 172 2.12.1 CONTEXT TOPOLOGY FOR TRAFFIC TROUBLESHOOTING EXERCISE ..................................................... 173 2.12.2 TRAFFIC TROUBLESHOOTING EXERCISE REVIEW ......... 173 3 CHAPTER SUMMARY .................................................................. 177 9 ACRONYMS AND ABBREVIATIONS ........................................... 179 10 INDEX .......................................................................................... 183 11 TABLE OF FIGURES ................................................................... 185 LZT1381712 R1A © Ericsson AB 2015 -9 - Ericsson SSR 8000 R15 System Troubleshooting Intentionally Blank - 10 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 1 CLI Tools for Troubleshooting Chapter Objectives After this course the participant will be able to: › Identify the CLI Tools for Troubleshooting › Describe the grep and its Options › Understand the use of CLI Command Aliases as Shortcuts › Using CLI Command macros to Execute Multiple Command with Single Command Figure 1-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 11 - Ericsson SSR 8000 R15 System Troubleshooting 1 Review Fundamental Concepts Figure 1-2: Review Fundamental Concepts ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ - 12 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 1.1 Context, Interfaces, & Bindings Architecture Figure 1-3: Context, Interfaces, & Bindings Architecture ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ LZT1381712 R1A © Ericsson AB 2015 - 13 - Ericsson SSR 8000 R15 System Troubleshooting 1.2 Terminology Context A context is an instance of a virtual router. A context has its own management domain, Authentication, Authorization, and Accounting (AAA) name space, IP address space, and routing protocols. You create and delete contexts with configuration commands. Contexts share common resources, such as memory and processor cycles, but each context functions independently of all other contexts configured on the router. Every configuration includes a local context, which cannot be deleted. In singlecontext configurations, the local context is the only context. All Ericsson IP Operating System features, such as the Command-Line Interface (CLI), Simple Network Management Protocol (SNMP), troubleshooting features, such as ping, traceroute, debug, and system logging, IP addresses, interfaces, and access control lists are implemented per context. Likewise, each context has its own complete implementation of IP routing protocols, including Border Gateway Protocol (BGP), Open Shortest Path First (OSPF), Intermediate System-toIntermediate System (IS-IS), and the complete IP multicast routing protocol suite. Each BGP instance has its own autonomous system number, policies, and import and export properties. Each context can contain any mix of Interior Gateway Protocol routing protocols. Each context has its own IP address space, which can overlap with the address space of other contexts. Any physical port and circuit can be associated with a context through configuration commands and the binding process. A context can have its own unique set of CLI administrators, each with their own (possibly overlapping) administrator names and passwords, and each authenticated through their own set of AAA databases. Each context can have its own SNMP community strings. This support allows VPN customers visibility into their own routing context for debugging and troubleshooting purposes. Interface The concept of an interface on the in the operating system differs from earlier networking devices. In earlier devices, the term interface was often used synonymously with port, channel, or circuit, which are physical entities to mean the point on a router where a physical port is bound to a physical communication line, either fiber-optic or copper. In the operating system, an interface is the managed object which provides higher layer protocol and service information (such as Layer 3 addressing) to a context at the point where a physical or logical circuit interfaces with the context. The decoupling of the interface from the physical layer entities enables many of the advanced features offered by the router. - 14 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting An interface is a logical entity that provides higher layer protocol and service information, such as Layer 3 addressing. Interfaces are configured as part of a context and are independent of physical ports and circuits. The separation of the interface from the physical layer allows for many of the advanced features offered by the router. For higher layer protocols to become active, you must bind a physical port or circuit to an interface. When you add interfaces to a context, the following restrictions apply: • A context can have only one interface per subnet. • The host portion of an IP address assigned to an interface cannot be 0. • The host portion of an IP address assigned to an interface cannot be the subnet for a broadcast IP address. • For an unnumbered interface, the IP address borrowed must be in the same context as the unnumbered interface. Port A port is a physical entity handling encapsulation and bits on the wire (eg, ATM, Eth, PoS). Ports in the router provide the physical connections to communication lines. Ethernet ports are the simplest type of circuits provided by the router. Circuit In general telecommunications use, a circuit is a communications path between two or more points. However, in the Ericsson IP Operating System, the term circuit refers to the endpoint of any segment of a communications path that terminates on a node in a network. An 802.1Q Permanent Virtual Circuit (PVC) or VLAN is a separate, administratively defined subgroup of a bridged LAN. Bridged LANs and 802.1Q encapsulation are described in the 802.1Q IEEE Standard for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networks specification, which defines the architecture and bridging protocols for partitioning a bridged LAN into VLANs. The router supports several types of circuits: • Ethernet ports, single-tagged 802.1Q Permanent Virtual Circuits (PVCs) (VLANs), and double-tagged 802.1ad tunnels All of these circuit types are also supported as aggregated circuits in 802.1AX link-groups. • LZT1381712 R1A Layer 2 service instances © Ericsson AB 2015 - 15 - Ericsson SSR 8000 R15 System Troubleshooting Service instances are subinterfaces of a LAN that accept one or more Layer 2 (802.1Q) services for transport across local physical ports or a provider backbone network. However, because packets flow through service instances when in cross-connections and in bridge configurations, they are also Ericsson IP OS circuits. • Generic Routing Encapsulation (GRE) tunnel circuits • Layer 2 Tunneling Protocol (L2TP) tunnel circuits • CLIPS virtual circuits • PPPoE virtual circuits SSR Configuration example Context ABC Port: port eth 1/1 Binding created by operator Cct: dot1q pvc 100 Interface test Interface context ABC ! interface test ip address 1.1.1.1/24 ! port eth 1/1 dot1q pvc 100 bind interface test ABC ! › Context: a ‘Virtual Router’ containing its own routing info, addresses, subs, VPNs etc. › Interface: a logical IP entity residing in the context (NOT the same as port or circuit) › Port: a physical entity handling encapsulation and bits on the wire (eg, ATM, Eth, PoS) › Circuit: the same as port, except at a more specific level (eg, ATM pvc or Ethernet vlan) › Binding: A virtual ‘patch-cable’ connecting the port/cct to the Interface, in the Context Figure 1-4: Terminology Binding A binding forms the association between a port or circuit and an interface (in a given context) on which a Layer 3 or higher protocol is configured. Data cannot flow on a port or circuit is bound to the interface and the higher layer protocol is configured and enabled. - 16 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting Binding an Ethernet Port to an Interface Bindings associate particular ports or circuits with the higher layer routing protocols configured for a context. No user data can flow on a port or circuit until a higher layer service is configured and associated with it. After a port or circuit is bound to an interface, traffic flows through the context as it would through any IP router. Unlike other IP operating systems that use implicit binding throughout, the operating system uses explicit binding; that is, the interface and the circuit exist as separate objects and become bound to each other only through explicit command which either associates a static circuit to an interface or dynamically associates a higher-layer protocol session with an interface. LZT1381712 R1A © Ericsson AB 2015 - 17 - Ericsson SSR 8000 R15 System Troubleshooting 2 Command Line Interface (CLI) Structure Figure 1-5: Command Line Interface (CLI) Structure 2.1 Introduction Factory default the SSR configuration is empty and none of the features and functions are enabled / configured To access the software and its CLI, use either of the following methods: - 18 - • Connect to the console port—You can connect a terminal to this port, either directly or through a terminal server. • Connect to the Ethernet management port—You can connect a terminal to the system over a LAN if remote access using SSH or Telnet has been enabled. © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting › Factory default the SSR configuration is empty and none of the features and functions are enabled / configured › The SSR platform can be configured by something called “Command Line Interface” › The Command Line Interface is intuitive and there are many cool tricks you can use to make life even easier Figure 1-6: Introduction › Before you can use the CLI to configure the SSR you need to be connected….. › Factory default means you can only configure the SSR using the console port › On the RPSW there is a console port for configuration purposes › Initially you can connect to the console port and start serial terminal (9600, N, 8, 1, no flow control). – In the training lab, we have a terminal server connected to the console port to allow remote access to the system. – The lab web page connects to the console port via the terminal server IP address and port number connected to specific SSRs. Console Port Ethernet Port We assume power is connected and the system is running. The management port is the 10/100/1000 Ethernet port located on the controller card and is designated for system management. The management port is configured in the local context. Figure 1-7: Factory Default System: Step one LZT1381712 R1A © Ericsson AB 2015 - 19 - Ericsson SSR 8000 R15 System Troubleshooting 2.2 Maneuvering through the CLI In the CLI, the two primary modes are exec and global configuration. When a session is initiated, the CLI is set to the exec mode by default. The exec mode allows you to examine the state of the system and perform most monitoring, troubleshooting, and administration tasks using a subset of the available CLI commands. Exec mode prompts can be one of the following forms, depending on the user privilege level [local]hostname# [local]hostname> Connected 1 [local]Ericsson> 1 Operator Monitoring 2 Enable 3 Administrator Monitoring 4 Global Config 2 [local]Ericsson> enable 3 [local]Ericsson# [local]Ericsson# config 4 [local]Ericsson(config)# [local]Ericsson(config)# context local 5 [local]Ericsson(config-ctx)# 5 Context Port QoS [local]Ericsson(config-ctx)# interface test ATM PVC 6 [local]Ericsson(config-if)# Bind [local]Ericsson(config-if)# exit 5 [local]Ericsson(config-ctx)# [local]Ericsson(config-ctx)# end 3 [local]Ericsson# 6 Subscriber password Interface Access-lists OSPF AAA OSPF Sub-modes Figure 1-8: Maneuvering through the CLI In this example, local is the context in which commands are applied and hostname is the currently configured hostname of the router. When you exit exec mode using the exit command, the entire CLI session ends. Global configuration mode is the top-level configuration mode; all other configuration modes are accessed from this mode. The configuration modes allow you to configure the system through the CLI, or to create and modify a configuration file offline by entering configuration commands using any text editor. After you have saved the file, you can then load it to the operating system. To access global configuration mode, enter the configure command in exec mode. Configuration mode prompts take the following form: [local]hostname(mode-name)# - 20 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting In the example, local is the context in which commands are applied, hostname is the currently configured hostname of the router, and mode-name is a string indicating the name of the current configuration mode. The prompt in global configuration mode, assuming the factory default hostname of Ericsson and the local context, is: [local]Ericsson(config)# Command modes exist in a hierarchy. You must access the higher-level command mode before you can access a lower-level command mode in the same chain. Exit: Exits the current configuration mode and returns to the next highest level configuration mode. At the exec prompt, closes an active terminal or console session, and terminates the session. End: Exits the current configuration mode and returns to exec mode. Navigate the CLI LZT1381712 R1A © Ericsson AB 2015 - 21 - Ericsson SSR 8000 R15 System Troubleshooting › › › All configuration commands are stored in transaction database Configuration will be But none of the commands are actually activated committed automatically when It is like writing your configuration on a sticky note you leave the configuration mode using exit or end › Activate your configuration: – [local]Ericsson(config)# commit – [local]Ericsson(config)# end – [local]Ericsson(config)# exit (right away, but do not leave configuration mode) (right away and leave configuration mode) (only if you jump out of configuration mode) You lose configuration when: • You get disconnected during configuration • You type abort during configuration › Throwing away your sticky note during configuration: – [local]Ericsson(config)# abort › Undo a single command: – [local]Ericsson(config-ctx)# no interface test Figure 1-9: If you are configuring… Commit: Commits an outstanding configuration database transaction. End: The end command to exit the current configuration mode and return to exec mode. When this command is entered, all commands entered since the beginning of the configuration session, or since the last abort or commit command in configuration mode, are committed to the database. Exit: The exit command to exit the current configuration mode, return to exec mode, or close an active terminal or console session. Entering this command in any configuration mode exits the current configuration mode and returns to the next highest level configuration mode. Abort: The abort command to delete an outstanding database transaction, which includes all configuration commands entered since the beginning of the configuration session or since the latest abort or commit command. Exiting Command Modes The following example exits global configuration mode and returns to exec mode: [local]Ericsson(config)#exit [local]Ericsson# The following example exits a CLI session: [local]Ericsson#exit - 22 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting The following example exits context configuration mode and returns to exec mode: [local]Ericsson(config-ctx)#end [local]Ericsson# 3 Monitoring with CLI Figure 1-10: Monitoring with CLI ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ LZT1381712 R1A © Ericsson AB 2015 - 23 - Ericsson SSR 8000 R15 System Troubleshooting 3.1 CLI Introduction and the prompt structure The primary administrator interface to the operating system is the CLI. It is an intuitive text based command interpreter from which you can efficiently operate, configure, verify and monitor the system by entering different commands. Based on the entered command the system will parse it and change the system parameters or output a result. In this example we entered a show command. In the next part, we have the configuration mode indicator. Here it is indicating that we are configuring a context. We will come back to this later in this course. Next is the system hostname. By default the hostname is set to Ericsson. We will show you later how to change this. The left-most part is the context monitoring mode indicator. It indicates the name of the current context that is monitored. All contexts in the system have unique names. The default administrative context called “local” is always there. This part of the prompt indicates that only the information for this context will be displayed when typing show commands. However, Context “local” is an exception. Using the show command when monitoring from context local can output information about all contexts. currently monitored context Configuration Mode Indicator [local]Ericsson(config-ctx)# show ... System Hostname (default) Enter commands Figure 1-11: CLI Introduction and the prompt structure - 24 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 3.2 Context Monitoring Router A To check configuration on Router A: 1. Connect wire to Router A Port To check configuration on Router B: 1. Disconnect wire from Router A 2. Connect wire to Router B Port Router B Port Port SSR Connect wire to Ericsson SSR Context A To check configuration on Context A: 1. Switch to context A using: [local]Ericsson# context A [A]Ericsson# To check configuration on Context B: 1. Switch to context B using: [A]Ericsson# context B [B]Ericsson# Interface Interface Context B Interface Interface Figure 1-12: Context monitoring 3.3 CLI Help To access the online Help for the CLI: • Use the ? command when entering a command to display the options available at the current state of the command syntax. Use the help command to display how to use the ? character to obtain help. › Within the CLI there is help available! › Use it, since it is very intuitive › “?” lists commands at current level [local]Ericsson# show ip ? access-list Display access list(s) all-host Display static and dynamic IP hosts to name mappings dynamic-host Display dynamic IP hosts to name mappings host Display static IP hosts to name mappings . . . › “command ?” lists options for that command [local]Ericsson(config-ctx)#router ? ancp Access Node Control (GSMP) bfd Bidirectional Forwarding Detection (BFD) bgp Border Gateway Protocol (BGP) . . . . [local]Ericsson(config-ctx)# router bgp ? 1..4294967295 Autonomous system (AS) number nn:nn AS number in nn:nn format Figure 1-13: CLI Help LZT1381712 R1A © Ericsson AB 2015 - 25 - Ericsson SSR 8000 R15 System Troubleshooting 3.4 CLI for the fast people The Tab key in any mode to complete a command. Partially typing a command name and pressing the Tab key causes the command to be displayed in full to the point where it is no longer unique and a further choice has to be made › The CLI will accept partially submitted commands, as long as it is unique: – Rather then typing “show configuration” you can also type “sho conf” › The CLI will complete a command if you press the TAB key. This is nice for the purists amongst us. › Use arrow keys to scroll through previous commands. › Use arrow keys to scroll through command › Full Emacs editing support › Press Enter at any time; parser reads the entire line Figure 1-14: CLI for the fast people - 26 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 4 Lab Environment Figure 1-15: Lab environment ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ LZT1381712 R1A © Ericsson AB 2015 - 27 - Ericsson SSR 8000 R15 System Troubleshooting 4.1 Connecting to Ericsson Training Labs All labs are accessible via SSH from any location • There are SDSL lines for the labs • Each lab is connected via dedicated SDSL link o There is a backup link from other provider for each line • Lab firewalls convert public IP addresses of SDSL lines to • IP addresses of ssh servers • SDSL link is mapped to ssh 1 server while an ssh2 works as backup Figure 1-16: Connecting to Ericsson Training labs - 28 - • Students access a lab through a gateway – ssh server • ssh server performs multiple functions o Verifies user’s credentials o Handles telnet connections between student’s PC and lab equipment (telnet is tunneled inside ssh session) o Provides http proxy functionality (needed to access CPE web interface) o And more but not related to student access © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 5 Configure Management Interface Figure 1-17: Configure Management Interface 5.1 Reference for this Module Management side Management Subscriber Circuit: •ATM PVC •FR DLCI •VLAN •Pseudo circuit (session) local Radius Session could be •DHCP (CLIPS) •PPPoE/A/oAoE Backbone Subscriber side Ericsson SSR Backbone side Figure 1-18: Reference for this module LZT1381712 R1A © Ericsson AB 2015 - 29 - Ericsson SSR 8000 R15 System Troubleshooting 5.2 Configure Management Interface X = group number [1—5] system hostname Train system location Training system contact GroupX port ethernet management no shutdown 1 context local administrator GrX password GrX enable password ericsson bind interface mgmt local 2 local 3 5 General Tasks: 1) System 2) Context 3) Interface 4) Port 5) Binding 6) Commit 4 Ethernet management interface mgmt ip address yy 6 commit Remember: 1. Names are case sensitive 2. Names are just variables and you determine their value 3. Context local is always there! Why? 4. How does a binding know which interface to select and which context? Figure 1-19: Configure Management interface system hostname Specifies the system hostname. Use the system hostname command to specify the system hostname. system location Configures the system location information. system location text Text that explains the physical location of the system. This argument can be any alphanumeric string, including spaces, from 1 to 39 characters. system contact Identifies the system contact.No system contact information is configured.Use the system contact command to configure the system to identify the person or department to contact regarding system information. port ethernet management Configures the Ethernet management port on the controller card. Use the port ethernet management command to create an Ethernet management port for administrative access to the router. The slot is assigned automatically. Use the no form of the command to remove an Ethernet management port. bind interface - 30 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting Statically binds a port, permanent virtual circuit (PVC), 802.1Q tunnel, a link group, Generic Routing Encapsulation (GRE) tunnel or tunnel circuit, or IP-in-IP tunnel to a previously created interface in the specified context. context local The special context local is always present and has unique qualities. Only an administrator authenticated in the local context can configure the system. Administrators authenticated in the local context can observe any portion of the system, regardless of context. Administrators authenticated in other contexts are restricted to the portion of the system relevant to that context. Contexts are independent name spaces and data spaces. For example, a routing process in one context can share routing information with a routing process in another context through inter-context interfaces, just as physical routers are connected together by physical cables. administrator admin-name [encrypted encrypt-type password | password password] Creates an administrator logon account, or selects an existing one for modification, and enters administrator configuration mode. admin-name Alphanumeric string representing a new or existing administrator. encrypted encrypt-type password Required only when configuring a new administrator account. Alphanumeric string representing an encrypted type 1 or type 2 password for the administrator account. Enter an already encrypted password to define the password. password password Required only when configuring a new administrator account. Alphanumeric string representing an unencrypted password for the administrator account. enable password ericsson Configures a password for the specified privilege level that the system encrypts. Use the enable password command to configure a password for the specified privilege level that the system encrypts. The router supports privilege levels 0 to 15 for both administrators and commands. Privilege levels are enabled on a per-context basis. If password authentication is enabled, the system prompts for the password when the administrator enters the privilege level using the enable command in exec mode. By default, local password authentication is enabled (see the enable authentication command). LZT1381712 R1A © Ericsson AB 2015 - 31 - Ericsson SSR 8000 R15 System Troubleshooting To protect your passwords, the system does not store or display this command. Instead, the system stores and displays the password in an encrypted form. When displaying the configuration, the system uses the enable encrypted command in context configuration mode. Use the no form of this command to delete the password for the specific privilege level. interface ip address <ip-address> Assigns a primary IP address, and optionally, one or more secondary IP addresses, to an interface. Use the ip address command to assign a primary IP address and, optionally, one or more secondary IP addresses to an interface. This assignment enables IP services on an interface. Use the ip-addr argument and either the netmask or /prefix-length construct to assign the interface a primary IP address and netmask or prefix length. For nonloopback interfaces, use the bind interface command in port configuration mode to bind a circuit to the interface on which IP services are enabled. Commit Commits an outstanding configuration database transaction. Use the commit command to commit an outstanding configuration database transaction. You can use the at or in keywords to schedule the transaction to be committed later. You can also associate a comment with the transaction. - 32 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 5.3 Validating the Configuration › Check if you are in the right context – [local]Train-1# show bind – [local]Train-1# show ip interface brief – [local]Train-1# ping 10.1.1.3 – [local]Train-1# show port – [local]Train-1# show port counter management – [local]Train-1# – …. – Connect Computer and start Telnet session to management interface address – …. › Save Configuration › [local]Train-1# save config /md/Ericsson_GrX.cfg Figure 1-20: Validating the configuration show binding Displays information on the configured bindings of one or more ports or permanent virtual circuits (PVCs) on the system. Use the show bindings command to display information on the configured bindings of one or more ports or permanent virtual circuits (PVCs) on the system. show ip interface brief Displays information about interfaces, including the interface bound to the Ethernet management port on the controller card. Use the show ip interface command to display information about all interfaces, including those on the controller card. Use this command without optional syntax to display detailed information on all configured interfaces. if-name Optional. Name of the interface to be displayed. all-context Optional. Displays interface information for all contexts. brief Optional. Displays the name, IP address, and other information, in brief, for all configured interfaces in the current context or, if the optional all-context keyword is used, all contexts. LZT1381712 R1A © Ericsson AB 2015 - 33 - Ericsson SSR 8000 R15 System Troubleshooting ping <ip-address> Tests whether the host is reachable. show port Displays a list of ports that are present or configured in the system. show port counter Displays the counters associated with system ports. Use the show port counters command to display counters associated with system ports. The values shown are accumulated since the counters were last cleared with the clear port counters command in exec mode or since the line card was last reloaded. Use the persistent keyword to display counter values accumulated since the system was last reloaded. If you specify the optional slot or port argument, the display shows counter information for the specified line card or port. By default, this command displays only summary counter information for all ports with their last known values, which are cached and updated every 60 seconds. Use the live keyword to read and display the current values for the summary counters. 5.4 Binding Information [local]Train-1# show bindings Circuit State Encaps management Up interface Summary: total: 1 up: 1 bound: 1 auth: 0 no-bind: 0 ether: 1 mpls: 0 clips: 0 ipsec: 0 ethernet down: 0 unbound: 0 interface: 1 atm: 0 fr: 0 ppp: 0 vpls: 0 ipv6v4-man: 0 Bind Type Bind Name management@local subscriber: 0 chdlc: 0 gre: 0 pppoe: 0 ipip: 0 ipv6v4-auto: 0 bypass: 0 dot1q: 0 [local]Train-1# › Circuit information › Binding state › Encapsulation applicable for circuit / binding › Binding type (bind interface) › Binding Reference / name (interface management, context local) Figure 1-21: Binding information - 34 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting Use the command show bindings to display information on the configured bindings of one or more ports or permanent virtual circuits (PVCs) on the system. The following example displays all bindings in the current context (local). 5.5 Exercise 1: Management Configuration › Please move to the exercises book. Figure 1-22: Exercise 1: Management configuration 6 Troubleshooting Preparation Commands & Tools Figure 1-23: Troubleshooting Preparation Commands & Tools 6.1 Troubleshooting Preparation Before you begin troubleshooting, gather the evidence of what has been happening on your router. Collect the output of the show tech-support command, and optionally, other show commands and macros for specific problems. Collect this evidence before beginning to troubleshoot, because some troubleshooting techniques destroy or modify already stored data. If you need to escalate your problem to customer support, you must include troubleshooting data with your support request. LZT1381712 R1A © Ericsson AB 2015 - 35 - Ericsson SSR 8000 R15 System Troubleshooting › It is very useful to prepare the system for more efficient and structured troubleshooting › In following slide we will present recommended commands and tools to use while troubleshooting Figure 1-24: Troubleshooting Preparation 6.2 Remote Terminal Session Timeout › By default SmartEdge disconnects administrator’s sessions after 10 minutes of inactivity › This is inconvenient when you are troubleshooting so lets disable this function [local]Train-1# [local]Train-1# configure Enter configuration commands, one per line, 'end' to exit [local]Train-1(config)# timeout session idle 0 [local]Train-1(config)# end Can also be [local]Train-1#exit configured per Connection closed by foreign host. administrator [student@ssh1-Gothenburg2] ~ $ telnet 10.1.1.106 Trying 10.1.1.106... Connected to 10.1.1.106. Refresh terminal Escape character is '^]'. Redback login: Figure 1-25: Remote terminal session timeout 6.3 Who is logged into the SSR › It might be useful to know who is logged in at the moment and when they connected [local]Train-1# show administrators TTY START TIME REMOTE HOST ADMINISTRATOR ------------------------------------------------------------------------------pts/24 Tue Dec 15 07:54:07 2015 10.1.1.1:tel admin@local * console Tue Dec 15 07:54:07 2015 (null) redback@local [local]Train-1# * (star): your current session › Note! You need to refresh your connection to see a new administrator sessions in the above output! Figure 1-26: Who is logged into the SSR? - 36 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting The show administrators command is used to display all administrator sessions on a system. Use the active keyword to limit the display to active sessions. With the keyword active , the argument admin-name can also be used to specify the sessions corresponding to a particular administrator. In the display, the asterisk (*) character denotes the administrator session in which this command was entered. 6.4 Command History › There is a CLI history log for the “monitoring mode” as well “configuration mode [local]train-1# show history ? configuration Display the session configuration command history | Output Modifiers <cr> History log of “configuration commands” [local]train-1# History log of “monitoring commands” [local]train-1# show history en case-0 show circuit qount 3/7 queue show circuit count 3/7 queue show config qos show hardware config show hardware show hardware detail [local]train-1# show history configuration system hostname train-1 end context xyz exit show history [local]train-1# [local]train-1(config)# show history system hostname train-1 end context xyz exit show history [local]train-1(config)# Figure 1-27: What did you type before? The show history command is used to display the command history for the current session. The history log contains up to 40 commands. To restrict the history to only the configuration commands entered during the session, use the optional configuration keyword, which is only available in exec mode. Usually Troubleshooting involves some unexpected behaviour and this may be as a result of user configuration. During configuration activities, or if multiple people are logged in, it may be useful to see what commands have been executed on the SSR. This is helpful in backtracking the cause of an event. The show history command displays a history of commands executed in the current Operation Mode. The operation Mode we are in can be seen by examining our prompt. We can either be in Configuration mode or User Executive mode. To see the history of configuration commands we have to run the command “show history” from configuration mode. LZT1381712 R1A © Ericsson AB 2015 - 37 - Ericsson SSR 8000 R15 System Troubleshooting Alternatively you can use the the command ‘show history configuration’from User Executive Mode. 6.5 Troubleshooting by Searching In a lot of cases system troubleshooting will involve interpreting the output of system show commands. In many cases you may need to search within the output generated from these show command. › One very useful tool for troubleshooting is searching and limiting the output › SSR provides following features in the command line interface: – Emacs – Regular expressions – Aliases – Macros Figure 1-28: Troubleshooting by searching and limiting the output You can search the whole output by using special characters and keys according to EMACS text interpretation. Another option for searching through the CLI output is using GREP. The system has some powerful built in GREP features that are useful with many show commands. GREP filters the output and displays only the rows which include a string of characters matching to the search pattern. GREP is a powerful text interpretation tool and there are also extended options available where you can use complex regular expressions. Macros are similar to Aliases but they allow multiple commands to be executed sequentially when the macro is called. A command macro is an extended alias that allows you to define a sequence of commands to run with the macro name instead of entering each command separately. - 38 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 6.6 Command Line Interface & Emacs › Output with more then 24 lines of output is considered large output and the screen will pause after 24 lines (auto more function) › If you want to abort the output, just type q when output shows ---more--› Searching through the output can be done as well, very powerful ! enable encrypted 1 $1$........$4qhlVuh2HDOCu/EbYfbM6. ! ! administrator redback encrypted 1 $1$........$4qhlVuh2HDOCu/EbYfbM6. ---(more)--- Default 24 lines displayed --(more)-- Output paused here Press h or H for help ---(CLI More Help)--Display this help: Move down half display: Move down one line: Move down one page: Move to bottom of output: Move to top of output: Move up half display: Move up one line: Move up one page: Quit automore: Redraw display: Repeat last search: Repeat last search in reverse direction: Search backwards through the output: Search forwards through the output: ---(End of CLI Help)--- h or H d, or ^D Enter, e, ^E, j or ^N Space, f, ^F or ^V G, >, or ESC-> g, < or ESC-< u or ^U y, ^Y, k, ^K or ^P b, ^B, or ESC-v q, Q, or ZZ ^L, r or ^R n N ?<string> /<string> It’s sometimes useful to change terminal output length: [local]Train-1#show terminal terminal name = /dev/ttyp0 terminal width = 80 terminal length = 24 terminal monitor = disabled [local]Train-1# To set the terminal output with no pause: [local]Train-1#terminal length 0 [local]Train-1# Figure 1-29: Command Line Interface & Emacs You can search the whole output by using special characters and keys according to EMACS text interpretation. For example, to search the output for the string “abc”, you can type ‘slash’ abc and press enter. If the CLI finds a match, it will move to that line. If you want to repeat the previous search, just press the lower case letter "n". Press capital "N" to repeat the search in the reverse direction. There are a number of other characters that can also be used. Lowercase g brings you to the top of the output, Upper Case G brings you to the bottom of the output, lower case bbrings you up a page and Spacebar brings you down one page. LZT1381712 R1A © Ericsson AB 2015 - 39 - Ericsson SSR 8000 R15 System Troubleshooting 6.7 Command Line Interface & Emacs Building configuration... Current configuration: ! ! Configuration last changed by user '%RCM%' at Thu Feb 8 07:57:05 2011 ! service multiple-contexts ! ! Bridge global configuration ! context local ! no ip domain-lookup ! interface 1 ip address 10.1.1.105/24 logging console ! enable encrypted 1 $1$........$4qhlVuh2HDOCu/EbYfbM6. ! administrator redback encrypted 1 $1$........$4qhlVuh2HDOCu/EbYfbM6. ---(more)--- /abc This will search for a match on the characters “abc” n Repeat the previous search in forward direction N Repeat the previous search in reverse direction g Top of output G Bottom (end) of output b Move up one page Space bar Default 24 lines displayed Very powerful for many show commands The matched value will be the first line displayed Move down one page Figure 1-30: Command Line Interface & Emacs 6.8 GREP: Global Regular Expression Parser GREP is a filter toll which will search for a match in a line . When match is found, GREP will display the complete line containing the match. › GREP is a filter tool which will search for a match in a line › When match is found, GREP will display the complete line containing the match › It is a great tool to limit the output from almost any command within the SmartEdge › Example: [local]Train-1# show configuration | grep port card ge-40-port 3 port ethernet 3/1 bind interface port27 abc Display all the line containing “port” port ethernet 5/1 port ethernet management [local] Train-1# Figure 1-31: GREP, Global Regular Expression Parser - 40 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 6.9 Extended GREP › Extended GREP enables more options when using GREP. [local]Train-1# show configuration | grep options '-E' › Extended GREP supports Regular Expressions (presented in next slides) › Other GREP command options: › '-c' count › '-i' ignore case › '-An', '-Bn', '-Cn' After, Before, Contain Figure 1-32: Extended GREP 6.10 Other Searching tools [local]Train-1# show configuration | ? append Append the output to the file begin Include lines beginning with the pattern count Count the number of lines exclude Exclude lines with the pattern grep Plain grep include Include lines with the pattern join-lines Join lines of a logical record for subsequent pattern matching save Save the output to the file Figure 1-33: Other searching tools The following example displays all lines from the output for the show configuration command (in any mode) beginning with the line before the first line that contains the word (pattern), ospf, and including the 6 lines after the first occurrence of the pattern. LZT1381712 R1A © Ericsson AB 2015 - 41 - Ericsson SSR 8000 R15 System Troubleshooting 6.11 Regular Expressions A regular expression (abbreviated regex or regexp )is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching. › › Can be used in EMAC and GREP within the SmartEdge Following are some examples of reserved words to build regular expressions: [local]Train-1# show configuration | grep option ‘-E’ ‘^hello’ › › › › ^ = match expression at the start of a line, as in ^hello $ = match expression at the end of a line, as in hello$ \ = turn off the special meaning of the next character, as in \^ [ ] = match any one of the enclosed characters, as in [aeiou12] Extended grep Use Hyphen "-" for a range, as in [0-9] › › › › › › [^ ] = match any one character except those enclosed in [ ], as in [^0-9] . = match a single character of any value, except end of line * = match zero or more of the preceding character or expression ( ) = group function {n,m} = repeat previous n to m | = logical or, 'abc|ABC' Figure 1-34: Regular expressions ^ Matches the starting position within the string. In line-based tools, it matches the starting position of any line. $ Matches the ending position of the string or the position just before a stringending newline. In line-based tools, it matches the ending position of any line. \ The backslash character (\) in a regular expression indicates that the character that follows it either is a special character, or should be interpreted literally. \n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. [] - 42 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". [^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". . Matches any single character * Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. () Defines a marked subexpression. {n,m} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". | The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc|def matches "abc" or "def". LZT1381712 R1A © Ericsson AB 2015 - 43 - Ericsson SSR 8000 R15 System Troubleshooting 6.12 Regular Expression: Examples with GREP › Grep example: Looking for connected subscribers with any IP address ending with 2: [local]Train-1# show sub act all | grep option '-E -B6' '([0-9]{1,3}\.){3}2' user2@VeryNiceService Circuit 3/1 pppoe 32 Internal Circuit 3/1:1023:63/6/2/32 Current port-limit 1 port-limit 1 (applied) range repeat ip pool (applied from sub_default) group ip address 100.1.1.2 (applied from pool) repeat [local]Train-1# › EMACS example: Searching config for lines with any IP address ending with 1xx: Search Enter this search pattern at --more-- Result ! Bridge global configuration ! ! context local ! no ip domain-lookup ! interface management ip address 10.1.1.101/24 logging console /([0-9]{1,3}\.){3}1 This line matches the ip address 10.1.1.101/24 logging console search ! enable encrypted 1 $1$........$4qhlVuh2HDOCu/EbYfbM6. ! ! administrator dj encrypted 1 $1$.. Figure 1-35: Regular expressions, examples with GREP › Use the repeat to search for match at specific location within the line [local]Train-1# show sub all | grep option '-E' '^.{35}user' pppoe 3/1 pppoe 32 user2@VeryNiceServ NiceServi Feb 14 20:11:37 [local]Train-1# [local]Train-1#show sub all | grep option '-E' '^.{35}user|--|TYPE' TYPE CIRCUIT SUBSCRIBER CONTEXT START TIME -------------------------------------------------------------------------------pppoe 3/1 pppoe 32 user2@VeryNiceServ NiceServi Feb 14 20:11:37 -------------------------------------------------------------------------------[local]Train-1# › Use the range and repeat to show processes with load equal / higher then 10% [local]Train-1# show proc | grep option '-E' '[1-9][0-9]{1,2}\...%|NAME' NAME PID SPAWN MEMORY TIME %CPU STATE UP/DOWN aaad 288 3 6464K 00:00:00.12 11.00% run 00:00:01 [local]Train-1# Figure 1-36: Regular expressions , examples with GREP - 44 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 7 Aliases and Macros Figure 1-37: Aliases and Macros 7.1 Introduction to Alias › Alias allows short notations of complex command string [local]Train-1(config)# alias exec test2 show config port 3/1 [local]Train-1# test2 [0] (test)# show config port 3/1 Building configuration... Current configuration: ! card ether-12-port 3 ! port ethernet 3/1 no shutdown encapsulation pppoe bind authentication chap ! end Name length is limited to 15 Be careful: Creating an alias name that matches an existing command name will override the command. To restore the command, the alias configuration needs to be removed. Figure 1-38: Introduction to Alias LZT1381712 R1A © Ericsson AB 2015 - 45 - Ericsson SSR 8000 R15 System Troubleshooting 7.2 Introduction to Macro The same name limitations as for aliases apply › Macro allows multiple command lines to be grouped together [local]Train-1(config)# macro exec hallo [local]Train-1(config-macro)# seq 10 show ip interface brief [local]Train-1(config-macro)# seq 20 show bindings [local]Train-1# hallo [10] (hallo)# show ip interface brief Thu Dec 15 12:23:13 2015 Name Address mgmt 10.1.1.102/24 [20] (hallo)# show bindings Circuit MTU 1500 State Up State Encaps management Up Bindings ethernet 7/1 Bind Type ethernet Bind Name interface mgmt@local -- cut -- Figure 1-39: Introduction to macro 7.3 Variables in Macros › Macros can include variables $1…$10: [local]Train-1# ping atm channel end-to-end 13 /1 vpi 0 vci 100 count 10 [local]Train-1(config)# macro exec atm-ping [local]Train-1(config-macro)# seq 10 show port $1/$2 [local]Train-1(config-macro)# seq 20 ping atm channel end-to-end $1/$2 vpi $3 vci $4 count $5 [local]Train-1# atm-ping 13 1 0 100 10 $1 $2 $3 $4 $5 Figure 1-40: Variables in Macros - 46 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 8 Exercise 2: Introduction, Searching and Filtering › Exercise to learn searching and filtering using: – Regular expressions – EMACS – GREP with macros › Part of the exercise is about connected subscribers – We need to use emulation for “show subscribers” since your system does not have any subscribers: Emulates “show subscriber all” command [local]Train-1# show configuration subs_all ... [local]Train-1# show configuration subs_active ... Emulates “show subscriber active” command Figure 1-41: Exercise 2: Introduction, Searching and Filtering 8.1 Exercise 2: Searching and Filtering › Please move to the exercises book. Figure 1-42: Exercise 2: Searching and Filtering 8.2 Exercise 2, Review (1-4) › Exercise 2.1, Save filtered output to file: [local]Train-1# show log | grep fail | save /flash/log_fail_2013.txt [local]Train-1# dir Contents of /flash/ ... -rw-r--r-- 1 root 0 413 Dec 15 07:40 log_fail_2015.txt -rw-r--r-- 1 root 0 3327 Dec 15 08:24 redback.bin -rw-r--r-- 1 root 0 986 Dec 15 08:24 redback.cfg ... › Exercise 2.2, Searching the output using EMACS: [local]Train-1# sh configuration subs_active 0016CED62A70@internet Circuit 12/1:1 vpi-vci 30 381 pppoe 1292 Internal Circuit 12/1:1:63/2/2/34 Interface bound 192.168.166.0 ... qos-metering-policy marking-qos-1 (applied from sub_default) 00173391DE24@internet We type “/” and pattern for match here ---(more)--- See next slide … Figure 1-43: Exercise 2, review (1-4) LZT1381712 R1A © Ericsson AB 2015 - 47 - Ericsson SSR 8000 R15 System Troubleshooting 8.3 Exercise 2, Review (2-4) › Exercise 2.2, Searching the output using EMACS: ... qos-metering-policy marking-qos-1 (applied from sub_default) 00173391DE24@internet /192\.168\.162\.105 › Result: ip address 192.168.162.105 (applied from pool) atm profile UBR-608 (applied) qos-queuing-policy PQ (applied) qos-metering-policy marking-qos-1 (applied from sub_default) 001733AE712C@internet ---(more)--- › Type “u” for up half page to check the subscriber username and “n” for next match: – Username: instructor@internet – Only one subscriber uses address 192.168.162.105 Figure 1-44: Exercise 2, review (2-4) 8.4 Exercise 2, Review (3-4) › Exercise 2.3, Macro for searching domains › Create Macro “subs_domain”: [local]Train-1(config)# macro exec subs_domain [local]Train-1(config-macro)# seq 10 show config subs_all | grep opt '-E -c -i' 'ericsson|redback' [local]Train-1(config-macro)# end › Execute Macro: [local]Train-1# subs_domain [10] (subs_domain)# show config subs_all | 2505 [local]Train-1# grep opt '-E -c -i' 'ericsson|redback' Figure 1-45: Exercise 2, review (3-4) - 48 - © Ericsson AB 2015 LZT1381712 R1A CLI Tools for Troubleshooting 8.5 Exercise 2, Review (4-4) (optional) › Exercise 2.4: Macro for searching with dates is optional (students that have more time). › Number of subscribers logged in between Oct 20th – 29th: [local]Train-1# show conf subs_all | grep option '-E' 'Oct 2[0-9]' | count 1245 [local]Train-1# › Subscribers logged in on October 8th (17 matches): [local]Train-1# show conf subs_all | grep option '-E' 'Oct {1,2}8' pppoe 12/1:1 vpi-vci 34 340 pppo user2@customer2.co internet Oct 8 11:56:18 pppoe 12/2:1 vpi-vci 31 322 pppo q4gL2@redback.com internet Oct 8 17:52:47 pppoe 12/2:1 vpi-vci 31 233 pppo kS9qO@provider2.co internet Oct 8 17:54:45 pppoe 12/2:1 vpi-vci 31 419 pppo BoKcC@customer1.co internet Oct 8 17:54:28 pppoe 12/3:1 vpi-vci 30 426 pppo jQaeQ@redback.com internet Oct 8 23:05:09 ... Figure 1-46: Exercise 2, review (4-4) (optional) LZT1381712 R1A © Ericsson AB 2015 - 49 - Ericsson SSR 8000 R15 System Troubleshooting 9 Chapter Summary After this course the participant should be able to: › Identify the CLI Tools for Troubleshooting › Describe the grep and its Options › Understand the use of CLI Command Aliases as Shortcuts › Using CLI Command macros to Execute Multiple Command with Single Command Figure 1-47: Chapter Summary - 50 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 2 Operational Health of the SSR System Chapter Objectives After this course the participant will be able to: › Understanding the Operational Health of the SSR System › Describe the basic RPSW health checks › Explain the details of control plane interface › Understand system storage Figure 2-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 51 - Ericsson SSR 8000 R15 System Troubleshooting 1 Troubleshooting Procedure Before you begin troubleshooting, gather the evidence of what has been happening on your router. Collect the output of the show tech-support command, and optionally, other show commands and macros for specific problems. Collect this evidence before beginning to troubleshoot, because some troubleshooting techniques destroy or modify already stored data. If you need to escalate your problem to customer support, you must include troubleshooting data with your support request › System alarms › Hardware Status System storage verification › Internal › External (optional) System processes › Processes verification › Finding CPU intensive processes › Process crash › Manual Coredumps Redundancy › XCRP / Alarm Card redundancy › Switch-over Boot Problems › Hardware related › Software related System Logging › Active Logs, Log files › Syslog Debugging (Last Resort) Document cases in a database System Hardware Health Check › Start debug › Display debug › Clear debug Figure 2-2: Recommended Troubleshooting Procedure In the next sections we will present recommended procedure for troubleshooting. We will use the tools presented earlier. We strongly recommend that every case is documented in a database which can be used for later troubleshooting. This can minimize the time required for troubleshooting, thus the system downtime and the OPEX. - 52 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.1 System Hardware Health System Hardware Health Checks are a good starting point for System Troubleshooting. There are many components to the SSR and when troubleshooting it’s important to verify the status of the hardware before trying to investigate problems with system processes, routing, packet processing and so on. 8x Control cards -Switch Fabric -Alarm -Route Processor 20x Line/Service cards -Line cards 40x1G 10x10G 2x40G, 1x100G -Smart Services Cards EPG, BNG, CDN, Service Management Figure 2-3: System Hardware Health LZT1381712 R1A © Ericsson AB 2015 - 53 - Ericsson SSR 8000 R15 System Troubleshooting 1.2 Overview: Hardware Status This section describes how to troubleshoot hardware problems We can also look at hardware status by typing ‘show hardware’ This command displays information about each Hardware component. Note that in the output shown some of the power module outputs have been omitted for simplicity. Each component has relevant information shown including Slot, Type of Hardware, Serial Number, Revision and Manufacture Date. Some hardware components will also include Payload which indicates the status of the hardware. In some cases this attribute is not used and is listed as N/A or non applicable. [local]Train-1# show hardware Slot Type Serial No Rev Mfg Date Payload ----- -------------------- -------------- ------- ----------- ------N/A backplane CF90000C81 R2G 02-FEB-2012 N/A FT1 ft ce510004an r2c 25-NOV-2011 N/A FT2 ft ce510004ah r2c 25-NOV-2011 N/A PM1 pm BR81691974 R2B 05-NOV-2011 N/A PM2 pm BR81691990 R2B 05-NOV-2011 N/A --- cut --PM8 pm BR81691991 R2B 05-NOV-2011 N/A RPSW1 rpsw Unavailable Unavailable OK Overview list RPSW2 rpsw CF90000AY0 R2H 06-DEC-2011 OK ALSW1 alsw CF90000B4V R2N 08-DEC-2011 OK of hardware ALSW2 alsw CF90000B4X R2N 08-DEC-2011 OK SW1 sw CF90000BPF R2M 13-DEC-2011 OK SW2 sw CF90000B8K R2M 11-unknown OK SW3 sw CF90000BN6 R2M 12-DEC-2011 OK SW4 sw CF90000BKY R2M 12-DEC-2011 OK 3 ge-40-port CF90000AJQ R2F 04-NOV-2011 OK 5 10ge-10-port CF90000AG3 R2D 03-NOV-2011 OK [local]Train-1# Figure 2-4: Overview: Hardware Status To check the hardware status of your router, use the show hardware command. - 54 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System [local] Ericsson# show hardware ? backplane Display backplane hardware information card Display hardware information for a specific card daughter-card Display daughter-card hardware information detail Display detail hardware information for all cards fantray Display fantray hardware information power-module Display power-module hardware information thermal Display hardware thermal information for all cards | Output Modifiers <cr> [local]Train-1# show hardware card ? 1..20 Slot number ALSW1..ALSW2 Slot number RPSW1..RPSW2 Slot number SW1..SW4 Slot number Figure 2-5: More detailed hardware info We can see the options for looking at hardware information related to Cards such as Line Cards, Route Processor Switch Cards, Switch Cards and Alarm Cards and other hardware components of the chassis such as fan trays and power modules. LZT1381712 R1A © Ericsson AB 2015 - 55 - Ericsson SSR 8000 R15 System Troubleshooting 1.3 Retrieving Hardware Details Line Cards We can get detailed information for a particular hardware component by using the keyword ‘detail’ at the end of the ‘show hardware’ command. As an example let us look at the detailed information we can grab for a Line Card An important check that can be done here is the Line Card Temperature. It is important that the temperature of a Line Card is not too high. If the Temperature is too high or near the edge of the acceptable value then the card may flap between being up and down causing problems. Voltage Values can also be examined here. Value ranges here should be within 5% of the expected values. [local]Train-1#show hardware card 1 detail Slot : 1 Type Serial No : D290092314 Hardware Rev Mfg Date : 02-APR-2014 Activated Time : 178 h WLCC-W024 : 12 Fluffy-W024 : 16 FEX-W024 : 6 Voltage 3.300V : 3.335 (+1%) Voltage 1.200V Voltage 3.300V : 3.341 (+1%) Voltage 1.100V Voltage 1.000V : 1.014 (+1%) Voltage 1.800V Voltage 1.500V : 1.512 (+1%) Voltage 1.000V --omitted Voltage 5.000V : 5.140 (+3%) Inlet Temp : Normal (28 C) Card Temp Status Payload Status : OK OSD Status POD Status : Passed Failed LED : Off IS LED Standby LED : Off Swap LED Ejector Switch : 1 (Locked) Last Payld Reset : Power On Good news - no card alarms Active Alarms : NONE : 1-10ge-20-4-port : R5H : 1.207 (+1%) : 1.107 (+1%) : 1.818 (+1%) : 1.010 (+1%) Temperature and voltage within normal range : Normal : Passed Power on diagnostics positive : On : Off LED status Figure 2-6: Retrieving hardware details Line cards Each line card has one or more FPGAs. Each software release has a supported FPGA version, and all FPGA images are bundled with the line card image. FPGAs are upgraded automatically. The line cards have the following FPGAs: - 56 - • WLCC—Line card control. Terminates the Connection Manager (CM) bus between the line card and RPSW (Controller) card and turns on the power sequence on the board. • WXFP/WSFP—Configuration status. Muxes system clocks and aggregates interrupts (for example, SFP/XFP/Phy interrupts). • WLCFAP—Connected to the FAP on the LC and has a PCI bridge. Collects statistics from FAP. Programmed every time the FPGA is turned on. The bus to the WSFP FPGA is used for programming. © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.4 RPSW Hardware Information We can see similar information for an RP card too. The Phalanx Version used by the RP is one piece of information that may be useful from the output of this command.Phalanx is a CPLD chip which is not field upgradable unlike all other FPGAs on the system that are automatically upgraded when we upgrade IPOS on the RP. Ensure that Phalanx version on both RP cards are the same. As well as this there is information listed similar to the line card for voltage levels and temperature values. [local] Ericsson# show hardware card rpsw1 detail Slot : RPSW1 Type Serial No : CF90000BSH Hardware Rev CLEI Code : IPUCA272AA Product Code Mfg Date : 22-DEC-2011 Activated Time : 22 h Phalanx : 3.0.13 Spanky : 02.02 Voltage 54.000V : 54.032 (+0%) Voltage 12.000V Voltage 1.050V : 1.054 (+0%) Voltage 1.500V Voltage 1.000V : 1.000 (+0%) Voltage 1.800V Voltage 1.200V : 1.199 (-0%) Voltage 1.000V Voltage 1.000V : 1.000 (+0%) Voltage 0.900V Inlet Temp : Normal (31 C) Card Temp Status Payload Status : OK OSD Status POD Status : Passed Failed LED : Off IS LED Standby LED : Off Swap LED Ejector Switch : 1 (Locked) Last Payld Reset : Admin Active Alarms : NONE : rpsw : R2H : W006 : 12.031 (+0%) : 1.500 (+0%) : 1.800 (+0%) : 0.999 (-0%) : 0.900 (+0%) : Normal : Passed : On : Off Figure 2-7: RPSW hardware information LZT1381712 R1A © Ericsson AB 2015 - 57 - Ericsson SSR 8000 R15 System Troubleshooting 1.5 ALSW Hardware Information Detailed Hardware Information for the Alarm Switch Card is shown here. The output displays similar information as for an RP card. Notice the status of the Alarm LEDs we mentioned previously, which indicate the existence of system alarms. [local] Ericsson# sh hardware card alsw1 detail Slot : ALSW1 Type Serial No : CF900009WK Hardware Rev Mfg Date : 07-OCT-2011 Activated Time : 6 h Farquaad : 09 Shiba : 03.06 Voltage 54.000V : 53.287 (-1%) Voltage 12.000V Voltage 3.300V : 3.299 (-0%) Inlet Temp : Normal (24 C) Card Temp Status Payload Status : OK OSD Status POD Status : Passed Failed LED : Off IS LED Standby LED : Off Swap LED Ejector Switch : 1 (Locked) Last Payld Reset : Reset Button Active Alarms : NONE Power LED : On Fan LED : Off Critical Alarm LED : On Major Alarm LED : Off Minor Alarm LED : On : alsw : R2H : 12.000 (+0%) : Normal : Not Run : On : Off Note! There two sets of LEDs on the ALSW card: › ALSW local card LEDs › SSR System LEDs › Alarms will be covered in later slides Figure 2-8: ALSW hardware information In this case note that the output is quite long and is only partially shown but you should see the status of active alarms of each hardware component. - 58 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.6 Finding Hardware Alarms (1-2) › You can quickly see all alarms across chassis using grep [local] Ericsson# show hardware detail | grep option -E 'Alarm|Slot' Slot : N/A Type : backplane Active Alarms : N/A Slot : FT1 Type : ft Active Alarms : NONE Slot : FT2 Type : ft Active Alarms : NONE Slot : PM1 Type : pm Active Alarms : Input Failure - Feed B Slot : PM2 Type : pm Active Alarms : Input Failure - Feed B Slot : PM3 Type : pm Active Alarms : Input Failure - Feed B Slot : PM4 Type : pm Active Alarms : Input Failure - Feed B Slot : PM5 Type : pm Active Alarms : Input Failure - Both Feeds Slot : PM6 Type : pm Active Alarms : Input Failure - Both Feeds --More-- (Not all output is displayed) Figure 2-9: Finding hardware alarms (1-2) In this case note that the output is quite long and is only partially shown but you should see the status of active alarms of each hardware component. 1.7 Finding Hardware Alarms (2-2) › Simplifying with macro [local]Train-1(config)# macro exec checkhw [local]Train-1(config-macro)# seq 10 show clock [local]Train-1(config-macro)# seq 20 show hardware detail | grep option '-E' 'Alarms|Slot' [local]Train-1# checkhw [10] (checkhw)# show clock Thu Oct 27 08:49:36 2011 GMT [20] (checkhw)# show hardware detail | grep option '-E' 'Alarms|Slot' Slot : N/A Type : backplane Active Alarms : N/A Slot : FT1 Type : ft Active Alarms : NONE Slot : FT2 Type : ft Active Alarms : NONE Slot : PM1 Type : pm Active Alarms : Input Failure - Feed B Slot : PM2 Type : pm Active Alarms : Input Failure - Feed B Slot : PM3 Type : pm --More-- Figure 2-10: Finding hardware alarms (2-2) If we want to keep track of Hardware alarms a useful alias can be written using the grep –E option on the output of Show hardware detaillooking for the strings Alarm and Slot. This lists all lines in the output of show hardware detail that contain the keywords ‘slot’ OR ‘alarm’ LZT1381712 R1A © Ericsson AB 2015 - 59 - Ericsson SSR 8000 R15 System Troubleshooting 1.8 System Hardware Checks A good place to start is viewing all system alarms on the Chassis. The command that is used is ‘Show System alarm’. As you can see, on this particular chassis there are quite a few alarms that can be seen. It is important to be able to distinguish the different types of alarms and whether these are expected alarms or alarms that require urgent attention. In this case you can see we have a number of alarms related to the Power Modules. The chassis we are using happens to only have four out of a possible eight power modules so Four power Modules are seen to be missing and as well as this, each power module is only using one out of a possible two power source inputs so the second feed for the active power modules are seen to be missing [local]Ericsson# show system alarm Timestamp Source Severity Description ------------------------------------------------------------------------Dec 15 17:33:06.509 PM1 Minor Input Failure - Feed B Dec 15 17:33:06.512 PM2 Minor Input Failure - Feed B Dec 15 17:33:06.514 PM3 Minor Input Failure - Feed B Dec 15 17:33:06.517 PM4 Minor Input Failure - Feed B Dec 15 17:33:09.875 PM5 Minor Power Module Missing Dec 15 17:33:09.875 PM6 Minor Power Module Missing Dec 15 17:33:09.885 PM7 Minor Power Module Missing Dec 15 17:33:09.911 PM8 Minor Power Module Missing Alarm levels Minor Major Critical Overview of system alarms Figure 2-11: System Hardware Checks It is recommended that all customer SSR deployments have all power modules loaded and dual feeds for each in which case these alarms would not be seen. We will look at this in more detail when we discuss Troubleshooting of Power Modules in a later section. For each System alarm there are three possible severities. Minor , Major and Critical. In this case these alarms are classed as Minor as the system is still sufficiently powered and operational - 60 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.9 System Alarms The show system alarm command displays system, card, and port alarms. Displays alarms for the chassis, line cards, or Smart Services Cards (SSCs) and, optionally, specific ports, controller cards, alarm cards, power modules, fantrays, and switch fabric components. › Please note the “all” option! To emulate this: • Configure a card that is not inserted [local]Train-1# show system alarm Timestamp Type Source Severity Description -------------------------------------------------------------------------------Dec 15 14:44:08.197 2 Critical Card Missing Dec 15 09:51:25.466 PM1 Minor Input Failure - Feed B Dec 15 09:51:25.499 PM2 Minor Input Failure - Feed B Dec 15 09:26:09.550 5 Minor Filesystem Full [local]Train-1# show system alarm ? ALSW1..ALSW2 Display active alarms for specified ALSW slot FT1..FT2 Display active alarms for specified FT slot PM1..PM8 Display active alarms for specified PM slot RPSW1..RPSW2 Display active alarms for specified RPSW slot SW1..SW4 Display active alarms for specified SW slot chassis Display active chassis alarms slot/port:ch:sub[:subsub] Display active alarms for specified LC slot, port, and channel numbers | Output Modifiers <cr> [local]Train-1# Figure 2-12: System alarms 1.10 System Alarm with Options No alamrs [local]Train-1# show system alarm chassis Timestamp Source Severity Description -------------------------------------------------------------------------------[local]Train-1# show system alarm ALSW1 Timestamp Source Severity Description -------------------------------------------------------------------------------[local]Train-1# show system alarm PM1 Timestamp Source Severity Description -------------------------------------------------------------------------------Sep 5 18:18:19.773 PM1 Minor Input Failure - Feed B Power Module alarm [local]Train-1# show system alarm 3/1 Timestamp Source Severity Description -------------------------------------------------------------------------------Port alarm Sep 7 15:50:25.421 3/1 Major Link down Figure 2-13: System Alarm with Options, Examples LZT1381712 R1A © Ericsson AB 2015 - 61 - Ericsson SSR 8000 R15 System Troubleshooting 1.11 Example: Initiating Major System Alarm If you have access to a chassis it is easy to create Major and Critical alarms. To create a Major alarm configure a port that has no active cable connection and execute the ‘no shutdown’ command. This will then generate a major alarm because, as far as the SSR is concerned, a link that should be up has no active connection. The Alarm for Link Down is raised as shown. › System alarm can be generated based on configuration mistakes by administrator Port with no cable connected [local]Ericsson(config)# port ethernet 1/19 [local]Ericsson(config-port)# no shutdown [local]Ericsson(config-port)# end [local]Ericsson# show system alarm Timestamp Source Severity Description --------------------------------------------------------------------Jun 13 00:19:50.940 1/19 Major Link down › Example solutions: – Port not activated (by default): enter “no shutdown” – Missing cable: connect cable – Other end missing config: Configure the port on the other end – Wrong port configured: configure correct port Figure 2-14: Example: Initiating Major System Alarm - 62 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.12 Example: Initiating Critical System Alarm It is also easy to create a Critical alarm. To do this, simply configure a card that is not physically inserted in the chassis. This will then generate a Critical alarm because as far as the SSR is concerned there is a card that should be present that cannot be detected and the card may be malfunctioning or may have been wrongly removed. This particular alarm may be a nuisance if you configure a card before inserting it into the chassis, but do not want to have a critical alarm generated. In this case there is a useful Card Configuration Command ‘deactivate’ which allows you to maintain any configuration of a card without a critical alarm being raised as the SSR is aware that the card is not activated yet. As you can see this clears the alarm previously raised. [local]Ericsson(config)# card ge-40-port 17 [local]Ericsson(config-card)# end No Card Present in Slot 17 [local] Ericsson# show system alarm Timestamp Source Severity Description ------------------------------------------------------------------------Jun 13 00:23:45.080 17 Critical Card Missing › Solution: [local]SR1-1(config)#card ge-40-port 17 [local]SR1-1(config-card)#deactivate [local]SR1-1(config-card)#end [local]SR1-1#sh sys alarm Timestamp Source Severity Description ------------------------------------------------------------------------ Figure 2-15: Example: Initiating Critical System Alarm LZT1381712 R1A © Ericsson AB 2015 - 63 - Ericsson SSR 8000 R15 System Troubleshooting 1.13 System Hardware LED Alarms generated will be Physically evident on the Alarm Card LEDs. The image shows the Alarm card for a chassis with both Minor and Critical Alarms currently raised. Critical alarm Minor alarm Figure 2-16: System Hardware LED The ALSW and ALSW-T alarm cards contain alarm management functionality, timing/synchronization support, an instance of the central switch fabric, and an internal management control plane. SSR 8000 Series must have at least one alarm card, but may be equipped with two cards of the same type for redundancy. The alarm cards are responsible for: - 64 - • Visual alarm management • Synchronization timing distribution • External T1/E1 BITS inputs through RJ-48 connectors with wiring 1,2 and 4,5 • Switch fabric (along with the other half-height cards) © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.14 Card Powered Down Note that in the output shown some of the power module outputs have been omitted for simplicity. We can see in this case Status of card in slot 1 is ok. Card in slot 12 is powered down The reason for this is that Card 12 is inserted in the chassis but has not been configured by the administrator yet, as shown by the output of “show configuration card 12” [local] Ericsson# show hardware Slot Type Serial No Rev Mfg Date Payload ----- -------------------- -------------- ------- ----------- ------N/A backplane CF90000CM9 R2G 29-MAY-2012 N/A ... 1 12 ge-40-port ge-40-port CF90000BGX CF90000BYT R2H R2H [local]SR1-1# show configuration card 12 Building configuration... 27-DEC-2011 OK 02-JAN-2012 Power D Card 12 Powered Down Current configuration: ! end Figure 2-17: Card Powered Down LZT1381712 R1A © Ericsson AB 2015 - 65 - Ericsson SSR 8000 R15 System Troubleshooting 1.15 System Storage Verification Figure 2-18: System storage Verification ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ - 66 - © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System 1.16 System Storage RPSW and RPSW-V2 (Controller) card contain both the control processor and an instance of the central switch fabric. The difference between these two controller cards is in the amount of the internal storage. An SSR 8000 Series chassis must have at least one controller card. › Each RPSW card has internal storage media to store the operating system, configuration files, and other system files. › The SSR has two internal 16 GB disks on each RPSW: 16GB (flash) +16GB (md) Internal storage – Three partitions on first Disk: › p01, p02 (8GB) system boot partitions that store operating system image files (active partition, alternate partition) › /flash (8GB) primarily used for storing and managing configuration files – One Partition ob the second disk › /md (16 GB) all kernel and application core files and log files › Optional: external USB storage for transferring software images, logs, configuration files › Each line card has one 2 GB internal storage disk that is partitioned in four parts: /p01, /p02, /flash, and /var (/var/md). Figure 2-19: System storage Each RPSW card has internal storage media to store the operating system, configuration files, and other system files. The SSR has two internal 16 GB disks on each RPSW: Three partitions on first Disk: • p01, p02 (8GB) system boot partitions that store operating system image files (active partition, alternate partition) • /flash (8GB) primarily used for storing and managing configuration files One Partition on the second disk • /md (16 GB) all kernel and application core files and log files Optional: external USB storage for transferring software images, logs, configuration files Each line card has one 2 GB internal storage disk that is partitioned in four parts: /p01, /p02, /flash, and /var (/var/md). LZT1381712 R1A © Ericsson AB 2015 - 67 - Ericsson SSR 8000 R15 System Troubleshooting 1.17 System Storage Verification Show disk: displays status for the internal storage partitions and optional USB mass-storage devices. › Verification of errors and free space within storage media show disk [card {slot-id | all}] [internal | external] [detail] – Displays status for the internal storage partitions and optional USB mass-storage devices. Figure 2-20: System storage verification 1.18 System Storage Verification: Example Use the show disk command to display status for the internal storage partitions and an external USB storage device. The command also displays the soft and hard error count for the system storage. [local]Train-1# show disk internal detail Manufacturer : SMART (/dev/sda) Model : eUSB Serial Number : 1E884210130309181111 Manufacturer Model Serial Number Filesystem rootfs /dev/sda2 /dev/sdb1 /dev/sda3 : SMART (/dev/sdb) : eUSB : 3E3B2F12132259181111 1k-blocks 3969036 3969068 15604376 7665864 Disk usage: not full OK Used Available Use% Mounted on 1726740 2042260 46% / 916612 2852420 24% /p02 1307320 13510644 9% /var 150644 7128880 2% /flash Figure 2-21: System storage verification: Example show disk internal Example The following example displays status for the internal storage partitions on the active controller card. [local]Ericsson> sh disk internal detail - 68 - Manufacturer Model Serial Number : SMART (/dev/sda) : eUSB : SPG124600L3 Manufacturer : SMART (/dev/sdb) © Ericsson AB 2015 LZT1381712 R1A Operational Health of the SSR System Model Serial Number Filesystem rootfs /dev/sda2 /dev/sdb1 /dev/sda3 /dev/md0 : eUSB : SPG124600KU 1k-blocks 3872856 3880920 15499740 7745836 31047684 Used Available Use% Mounted on 2894516 783152 79% / 2724900 960432 74% /p02 646580 14072004 4% /var 179680 7175780 2% /flash 176260 29306704 1% /opt/disk show disk external Example The following example displays status for the USB mass-storage device in the USB port of the active controller card. [local]Ericsson>sh disk external Filesystem 1k-blocks Used Available Use% Mounted on /dev/sdc1 2038464 1652032 386432 82% /media/flash show disk external detail Example The following example displays status for the USB device. [local]Ericsson>sh disk external detail Manufacturer : Generic Model : Mass Storage Seial Num. : BEE2D3C5 Filesystem /dev/sdc1 LZT1381712 R1A 1k-blocks 2038464 Used Available Use% Mounted on 1652032 386432 82% /media/flash © Ericsson AB 2015 - 69 - Ericsson SSR 8000 R15 System Troubleshooting 2 Chapter Summary After this course the participant should be able to: › Understanding the Operational Health of the SSR System › Describe the basic RPSW health checks › Explain the details of control plane interface › Understand system storage Figure 2-22: Chapter Summary - 70 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 3 Fundamental Concept of Processes Architecture on the System Chapter Objectives After this course the participant will be able to: › Describe the Fundamental Concept of Processes Architecture on the System › Describe the SSR software architecture and system processes › Understanding the concept of manual core dump › Identify the different types of processes in SSR › Work with Core Dumps of Faulty Processes Figure 3-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 71 - Ericsson SSR 8000 R15 System Troubleshooting 1 Process Architecture The SSR Operating System has a Modular system design. All functions and protocols are split into separate processes each running in their own protected memory space. As a result, failure of one protocol does not affect other protocols. Each process can be stopped and restarted individually, minimizing the impact on the overall system in case a failure occurs. For example, if the OSPF process fails, only updates to the OSPF routes will be temporarily affected, while all other protocols will continue to function. PM checks heartbeat msgs to monitor health of the system CSM csm: Specifies the Controller State Manager (CSM) process ‘Hub’ processes • ISM: Monitors and broadcasts the state of all interfaces, ports, and circuits in the system. • RCM: Controls all system configurations using a transactionoriented database. ‘Spoke’ processes Figure 3-2: Process Architecture An important process to note is the Process Manager, sometimes known as the “God Process”. The function of the Process Manager is to monitor all active processes in its respective hardware component. Active processes send a heartbeat to the PM. If the heartbeat is lost to a process, PM will restart this process. This is all done automatically, and the process is restored with minimal impact to the system. The huge benefit is that the whole system does not need to be reloaded when individual processes fail. - 72 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.1 RPSW Processes The Ericsson IP Operating System is a set of interacting software modules that are common to all Ericsson platforms running the operating system (OS). The operating system provides general interfaces for configuring and interacting with the system that are OS-independent, such as the Command Line Interface (CLI), Simple Network Management Protocol (SNMP), and console logs. Even OS-specific information, like lists of processes and counters, is displayed through the CLI in an OS-independent way. Most features and protocols have their own separate process. Each process has it’s own protected memory space. Implementing the major software components as independent processes allows a particular process to be stopped, restarted, and upgraded without reloading the entire system or individual traffic cards. In addition, if one component fails or is disrupted, the system continues to operate › Most features and protocols have their own separate process › Each process has it’s own protected memory space. › There are also quite a few internal system processes: – ISM (Interface and Circuit State Manager) - Manages the configuration and state of all ports, circuits and interfaces in the system. Is responsible for distributing this info thru the system. – CSM (Card Slot Module/Connection State Manager) - Manages all card and port config and state. Communicates with VxW and relays state information to ISM. – RCM (Router Config Module) - All config is managed by this process using a transaction-oriented database. Each process has an agent manager code running in this process Figure 3-3: RPSW Processes (1-3) Card State Manager (CSM) The Card State Manager (CSM) is a back-end process corresponding to card and port management. It relays card and port events to other back-end processes, such as ISM. CMA abstracts the details of the chassis so that the other software that is involved in chassis management (mostly CSM) can be generic and portable to other chassis architectures and types. This is not a separate process but rather a library that is linked with the process that needs the abstraction. Interface and Circuit State Manager (ISM) LZT1381712 R1A © Ericsson AB 2015 - 73 - Ericsson SSR 8000 R15 System Troubleshooting Interface and Circuit State Manager (ISM) monitors and disseminates the state of all interfaces, ports, and circuits in the system. ISM is the common hub for event messages within the system. When ISM receives an event, it marks the event as received and passes the event to interested clients. ISM tries to not send duplicate events to a client, but if it does, a client must handle the duplication. ISM sends events in a specific order, starting first with circuit events and followed by interface events in circuit/interface order. All circuit delete events are sent before any other circuit events, and all interface delete events are sent before any other interface events. This order is to ensure that deleted nodes are removed from the system as quickly as possible, because they might interfere with other nodes trying to take their place. Router Configuration Manager (RCM) The Router Configuration Manager (RCM) controls all system configurations using the configuration database. The RCM engine is responsible for initializing all component managers and for maintaining the list of all backend processes for communication. The set of managers and backend processes is set at compile time. The registration of manager to backend daemons occurs during RCM initialization, and each manager is responsible for notifying the RCM engine with which backend processes it communicates. The RCM engine provides a session thread for processing any connection requests from the interface layer. When a new interface layer component (CLI, NetOpd, and so on) wants to communicate through the DCL to RCM, it starts a new session with the RCM engine. Each session has a separate thread in RCM for processing DCL messages. Because the RCM managers are stateless, the threads only have mutual exclusion sections within the configuration database. Each session modifies the database through a transaction. These transactions provide all thread consistency for the RCM component managers. The RCM has many other threads. These threads are either dynamically spawned to perform a specific action or they live for the entire life of the RCM process. Process Manager (PM) The Process Manager (PM) monitors the health of every other process in the system. The PM is the first process started when the system boots. It starts all the other processes in the system. The list of processes to be started is described in a text file that is packaged with the software distribution. PM also monitors the processes and, if any process dies or appears to be stuck, it starts a new instance of the process. Reliable Database (RDB) The configuration database, also known as the Reliable Database (RDB), is a transactional database that maintains multiple transactions and avoids. - 74 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System By using the combination of transactions and locks, the configuration database maintains the ACID transactional properties, atomicity, consistency, isolation, and durability. Each property must be maintained by a database to ensure that data does not get corrupted. By ensuring that every operation within the database occurs as one atomic operation (atomicity), multiple users can interact with the system (isolation) as well as make the database recoverable (durability). The database must also provide facilities to allow a user to easily ensure the accuracy of data within the database (consistency). › Internal system processes continued: – – – PM (Process Manager) – The God process. Monitors and maintains the health of all processes in the system. Checks for periodic ‘heartbeat’ messages from each process to monitor their health. Ie, if PM doesn’t receive its heartbeat from a process it will initiate a coredump and restart of that process. RIB (Routing information dataBase) – Stores the main routing table RDB (Reliable dataBase) – Shared memory where all static configuration is stored Figure 3-4: RPSW Processes (2-3) › Process communication: All communication between processes is done using a Ericsson proprietary highly optimized reliable IPC – – – – Stands for ‘Inter-Process Communication’ Built on top of TCP Monitors the health of the system PM uses IPC to restart processes that are not responding › RPSW—Line Card: Communication between BSD processes & PPA done with IPC › Hub and Spoke architecture: Some processes like ISM, RIB, AAA, RPM are server processes that talk to several client processes. Figure 3-5: RPSW Processes (process communication) (3-3) LZT1381712 R1A © Ericsson AB 2015 - 75 - Ericsson SSR 8000 R15 System Troubleshooting 1.2 Process Scheduling Processes are scheduled to allow efficient use to shared CPU resources Without scheduling, one or two busy processes could ‘starve’ all other processes, leading to harmful effects on the system The ‘Run Queue’ is a metric used to measure the busyness of the system Internally the Run Queue is the number of processes, at any given time, that are waiting to be serviced by the CPU P1 P2 P1 CPU Nbr of Processes in queue: Run Queue › Processes are scheduled to allow efficient use to shared CPU resources › Without scheduling, one or two busy processes could ‘starve’ all other processes, leading to harmful effects on the system › The ‘Run Queue’ is a metric used to measure the busyness of the system › Internally the Run Queue is the number of processes, at any given time, that are waiting to be serviced by the CPU Figure 3-6: Process Scheduling - 76 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.3 RPSW Processes Verification Run queue over 5 sec, 1 and 3 5 minute averages [local]Train-1# show process Load Average : 1.40 1.32 1.27 1 NAME csm rcm ism ped_parse rpm rib ntp arp static isis rip bgp igmp pim ospf sysmon ---more--- MEMORY 6616K 13924K 4748K 3676K 3276K 4020K 0K 3492K 0K 0K 0K 0K 0K 0K 0K 3860K PID 26 27 28 29 30 31 0 32 0 0 0 0 0 0 0 33 4 SPAWN 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 Up/Down Time process is up or down associated to state TIME 00:00:22.18 00:00:07.51 00:00:05.95 00:00:03.27 00:00:02.70 00:00:04.61 Not Avail 00:00:03.31 Not Avail Not Avail Not Avail Not Avail Not Avail Not Avail Not Avail 00:00:03.32 %CPU 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 2 STATE run run run run run run demand run demand demand demand demand demand demand demand run UP/DOWN 05:35:42 05:35:35 05:35:34 05:35:33 05:35:33 05:35:32 05:37:29 05:35:31 05:37:29 05:37:29 05:37:29 05:37:29 05:37:29 05:37:29 05:37:29 05:35:31 State = run = active Spawn = 0 or 1 = good news State = stop = stopped Spwan > 1 process is restarted State = demand = sleeping Figure 3-7: RPSW processes verification Displays current status of one or all processes running on the system. show process [proc-name] [{crash-info | detail}] Syntax Description proc-name Optional. Process for which you want to display information. T crash-info Optional. Specifies that process crash information is to be monitored. detail Optional. Specifies that detailed process information is to be displayed. We can examine all process in Demand state. A process in demand state is waiting to be started in the configuration. The process hasn’t been configured on the SSR. For example if ISIS is not running on the SSR the SSR process is available to run but just hasn’t been configured yet. Spawn: generate ISM (Interface State Manager) configuration and state of all ports, circuits and interfaces CSM (Card Slot Module/Connection State Manager) RCM (Router Config Module) RDB (Reliable dataBase) LZT1381712 R1A © Ericsson AB 2015 - 77 - Ericsson SSR 8000 R15 System Troubleshooting monitor process: Monitor the status of a process and provide continuous updates. Enter this command in exec mode. You can see how long the process has been in its given state whether this is up or down. It is important not to confuse the UP/DOWN field with the TIME field. This one represents the total CPU time used by the process since it had started. 1.4 Finding CPU Intensive Processes › Finding processes claiming 10% or more of CPU resources: [local]Train-1# show proc | grep option '-E' '[1-9][0-9]{1,2}\...%' rcm 254 2 13272K 00:00:00.38 17.53% run hr 255 1 3400K 00:00:00.09 14.61% run 00:00:04 00:00:02 › Combined with macro it will be easy to use: [local]Train-1(config)# macro exec highload [local]Train-1(config-macro)# seq 10 show clock [local]Train-1(config-macro)# seq 20 show proc | grep option '-E' '[1-9][0-9]{1,2}\...%|NAME' [local]Train-1(config-macro)# end [local]Train-1# highload [10] (highload)# show clock Tue Dec 15 13:06:50 2015 UTC [20] (highload)# show proc | grep option '-E' '[1-9][0-9]{1,2}\...%|NAME' NAME PID SPAWN MEMORY TIME %CPU STATE UP/DOWN aaad 288 3 6464K 00:00:00.12 11.00% run 00:00:01 [local]Train-1# Figure 3-8: Finding CPU intensive processes ISM (Interface State Manager) CSM (Card Slot Module/Connection State Manager) RCM (Router Config Module) RDB (Reliable dataBase) - 78 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.5 Single Process Verification show process: displays current status of one or all processes running on the system. show process [card {slot | RPSW1 | RPSW2 | standby | ALSW1 | ALSW2 |all}] [proc-name] [crash-info | detail] slot Optional. Displays process information for the traffic card installed in the specified slot. RPSW1 Optional. Displays process information for the controller card in slot RPSW1. RPSW2 Optional. Displays process information for the controller card in slot RPSW2. standby Optional. Displays process information for the controller card running in standby mode. ALSW1 Optional. Displays process information for the ALSWT card in slot ALSW1. ALSW2 Optional. Displays process information for the ALSWT card in slot ALSW2. all Optional. Displays process information for all traffic cards installed in the router. proc-name Optional. Name of the process for which to display information. crash-info Optional. Monitors process crash information. detail Optional. Displays detailed process information. › Retrieving specific process information [local]Train-1# show process ism NAME PID ism 282 [local]Train-1# SPAWN 2 MEMORY 4356K TIME 00:00:00.20 %CPU 0.00% STATE run UP/DOWN 00:02:04 › For each process details can be retrieved, see next slide Figure 3-9: Single process verification LZT1381712 R1A © Ericsson AB 2015 - 79 - Ericsson SSR 8000 R15 System Troubleshooting 1.6 Single Process in Detail Keyword “crash-info” Display process crash information ISM (Interface State Manager): configuration and state of all ports, circuits and interfaces CSM (Card Slot Module/Connection State Manager): Manages all card and port config and state. RCM (Router Config Module) RDB (Reliable dataBase) [local]Train-1# show Process (PID) process ism detail : ism (282) Spawn count : 2 Memory : 4356K Time : 00:00:00.23 %CPU : 0.00% State : run Up time : 00:02:58 Heart beat : Enabled Spawn time : 2 seconds Max crashes allowed : 5 Crash thresh time : 86400 seconds Total crashes : 0 Fast restart : DISABLED Images: (Spawns, Max spawns, Version, Path) (*) 2, 3, v1, /usr/siara/bin/ism2 Client IPC Endpoints: EP 7f000206 f0bc0008 - L2TP-ISM-EP-NAME:00000000 EP 7f000206 f0bc000c - L2TP-ISM-EP-NAME:00000000 EP 7f000206 f0bc0008 - PPPOE-ISM-EP-NAME:00000000 EP 7f000206 f0bc0008 - LM-IPC-ISM-EP-NAME:00000000 --Server IPC Endpoints: EP 7f000206 f0bc000c - ISM2-MBE-EVIN-EP-NAME:00000000 Dependent process aaad (288) EP 7f000206 f53c000a Dependent process ppp (222) EP 7f000206 d6ad0007 Dependent process EPPA IPC SLOT 3 (-2130509824) EP 7f000a43 00000013 Dependent process IPPA IPC SLOT 3 (-2147287040) EP 7f000a03 00000015 Dependent process EPPA IPC SLOT 1 (-2130640892) EP 7f000a41 00040013 --- Figure 3-10: Single process in detail - 80 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.7 Single Process Verification - ISM [local] Ericsson# show process ism detail Process (PID) : ism (3615) Spawn count : 1 Memory : 8688K Time : 00:00:30.13 %CPU : 0.01% State : run Up time : 2d18h Heart beat : Enabled Spawn time : 2 seconds Max crashes allowed : 5 Crash thresh time : 86400 seconds Total crashes : 0 Fast restart : DISABLED Process has not had to be restarted When did it restart? PM controls health of the process Process has not Crashed No “Last Exit Status” shown Figure 3-11: Single Process Verification – ISM For each process we can retrieve detailed information using the “detail” keyword Spawn count equal to 1 means that the process has never been restarted From the “Up time” field we can derive when the process last restarted Heart beat enabled means that Process Manager is controlling the health of the process by means of heartbeats We can see how many times the process has crashed. In this case the total number of crashes is zero Since the process has never restarted no “Last Exit Status” message is shown LZT1381712 R1A © Ericsson AB 2015 - 81 - Ericsson SSR 8000 R15 System Troubleshooting 1.8 Single Process Verification - OSPF [local] Ericsson# show process ospf detail Process (PID) : ospf (23251) Spawn count : 2 Memory : 5364K Time : 00:00:00.93 %CPU : 0.27% State : run Up time : 00:13:36 Heart beat : Enabled Spawn time : 2 seconds Max crashes allowed : 5 Crash thresh time : 86400 seconds Total crashes : 0 Fast restart : DISABLED Last exit status : Kill (9) Process has had to be restarted Process has not Crashed Process was killed manually Figure 3-12: Single Process Verification – OSPF As another example let’s now grab the detailed information for the ospf process Spawn count equal to 2 means that the Process has restarted once. Even in this case the process has never crashed, that means that the restart has been caused by some other reason, for example manual intervention. We can derive the reason why the process restarted by analyzing the “Last exit status” message. In this case, since the process has restarted, the “Last exit status” is shown. Exit status “kill” means that the Process was manually killed. - 82 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.9 Maximum Crashes Allowed If the process keeps crashing, it indicates that there is a problem that needs to be dealt with. For this reason there is a defined limit on the number of crashes allowed. This can be seen when using the show process detail command where we see the “Maximum crashes allowed” in a specific “Crash Threshold Time” period. Limit on number of crashes allowed show process ism detail [local] Ericsson# Process (PID) : ism (3615) Spawn count : 1 Memory : 8688K Time : 00:00:30.13 %CPU : 0.01% Does not apply to Manual State : run Restarts Up time : 2d18h Heart beat : Enabled Spawn time : 2 seconds Max crashes allowed : 5 Crash thresh time : 86400 seconds Total crashes : 0 Process is allowed crash maximum of five times in 86400s after which it will not be restarted Fast restart : DISABLED Figure 3-13: Maximum Crashes Allowed The default values for these are 5 Crashes in 86400 seconds which is 1 day. This means if a process crashes and is forced to restart 5 times in a day it will not be restarted. It is important to note that this value only applies to actual crashes whereas manual restarts have no effect on this. In other words a manual restart doesn’t count as a crash. LZT1381712 R1A © Ericsson AB 2015 - 83 - Ericsson SSR 8000 R15 System Troubleshooting 1.10 Process Crash › The following sequence of events occur when a crash happens: – Crash event – Automatic core dump initiated – Process restarted after core dump completed: – Spawn-count increments – Process restarts and initializes › If the process keeps crashing, stop restarting after 5 crashes (by default) within 86400 seconds (24 hours). › This number changeable via: - process set <process> max-crashes › Rule doesn’t apply for manual restarts…process will keep coming up forever Figure 3-14: Process crash (1-2) So what happens when a process crashes? There is a predictable sequence of events when a Process crashes. First of all after a process crash there is an automatic generation of a core dump. The core dump contains information about the system state at the time of the crash particularly the memory state of the process in question. This is saved to disk for future analysis.After the core dump is completed the Process Manager will attempt to restart the process and the Spawn count we saw earlier increases. Core Dump /md Process Crash Process Restarted Spawn Count incremented +1 Figure 3-15: What happens when a process crashes? - 84 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.11 Software Process Failure Scenario 1 OSPF died Process Manager CLI, SNMP, other Config Process BGP Database Multicast PPP Process Manager CLI, SNMP, other OSPF Routing Information Base OS Kernel Active RPSW Static Config Process Database BGP Multicast PPP Process Manager CLI, SNMP, other OSPF Static Routing Information Base OS Kernel Standby RPSW 2 OSPF restartActive ALSW Config Process BGP Database Multicast PPP Process Manager CLI, SNMP, other OSPF Static Routing Information Base Config Process BGP Database Multicast PPP Process Manager CLI, SNMP, other OSPF Static Routing Information Base OS Kernel Active RPSW 3 Active ALSW OS Kernel Standby RPSW Config Process BGP Database Multicast PPP Process Manager CLI, SNMP, other OSPF Routing Information Base Static Config Process Database Active RPSW BGP Multicast PPP OSPF Static Routing Information Base OS Kernel OS Kernel Standby RPSW 1) Problem Occurs in Software 2) Process is restarted 3) Process comes back up • Only individual process is effected • All other processes continue to run All established connections remain up and forward traffic • Only effected process is restarted • Done completely automatically All established connections remain up and forward traffic • Process starts running again • NO RPSW switch over has to occur All established connections remain up and forwarding traffic Figure 3-16: Software Process Failure Scenario Let's take a closer look at what happens during a software process failure. First, a Problem Occurs in a software process. For example, the OSPF process dies. Only the individual process is effected. All other processes continue to run and all established connections remain up and forward traffic Next, the system detects that the OSPF process is down and automatically restarts it Finally, the process comes back up and starts running again; No hardware failure or switchover occurs. Data forwarding continued non-stop on the system, and the only impact was the short period when OSPF was unavailable to make changes to OSPF specific routes. LZT1381712 R1A © Ericsson AB 2015 - 85 - Ericsson SSR 8000 R15 System Troubleshooting › A coredump is created when a process crashes In this example ppp process crashed [local]Train-1# show crashfiles 344146 Oct 28 06:03 /md/pppd_859.core 342952 Oct 28 06:00 /md/20111028_060011_pppd_859.core 343897 Oct 28 06:02 /md/20111028_060212_pppd_859.core [local]Train-1# › Coredumps are saved on /md [local]Train-1# dir /md Contents of /md -rw------- 1 root 0 -rw------- 1 root 0 -rw------- 1 root 0 -rw------- 1 root 0 [local]Train-1# 343712 Oct 28 06:05 pppd_859.core 342952 Oct 28 06:00 20111028_060011_pppd_859.core 343897 Oct 28 06:02 20111028_060212_pppd_859.core 344146 Oct 28 06:03 20111028_060329_pppd_859.core Figure 3-17: Process crash (2-2) process coredump: Initiate a core dump of a process and save it in a crash file. Enter this command in exec mode. service upload-coredump: Crash files can be automatically uploaded to a remote server 1.12 System Stopped Processes Processes in Stop State indicate the process is Stopped and will not be restarted by the Process Manager unless this is requested. [local] Ericsson# process stop ospf [local] Ericsson# show process ospf NAME ospf PID 0 SPAWN 1 MEMORY 0K TIME Not Avail %CPU 0.00% STATE stop UP/DOWN 00:00:03 TIME 00:00:00.01 %CPU 0.00% STATE run UP/DOWN 00:00:04 [local] Ericsson# process start ospf [local] Ericsson# show process ospf NAME ospf PID 23251 SPAWN 2 MEMORY 5188K Figure 3-18: System Stopped Processes To illustrate we will use the “process stop” command to stop the ospf process. We can then use “show process ospf” to see the ospf process state. As expected the process is in Stop State. Note that the Spawn count is one, meaning this process has started once and is currently stopped. If we start the ospf process again and then run “show process ospf” we can see that it is in run state again. Note how the spawn count has increased to two. In this case due to a manual restart. - 86 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System Core dumps are stored on /md › Check for core dumps on the system [local]Train-1# show crashfiles 630974 Jun 6 04:19 /md/20120606_041954_netopd.3936.1338956394.SSR8020.core.gz 386124 May 9 16:37 /md/20120509_163729_ism2.3602.1336581449.Ericsson.core.gz 293762 Aug 1 01:40 /md/20120801_014055_pppd.22065.1343785255.SSR8020.core.gz 392714 Aug 1 01:41 /md/20120801_014104_pppd.22185.1343785264.SSR8020.core.gz[local]Train-1# › You can run a quick check over the core dump [local]Train-1# show process ppp crash-info NAME TIME ppp Wed Jul ppp Wed Jul ppp Wed Jul [local]Train-1# 4 16:02:46 2007 4 16:03:32 2007 4 16:04:35 2007 STATUS Trap (133) Trap (133) Software termination (15) There is more next slide Figure 3-19: Did a process crash? (1-2) Manually create coredump: [local]Train-1#process coredump [process name] Please turn on heart-beat once coredump is complete. [local]Train-1#process set aaad heart-beat on › After sharing core dumps with Ericsson TAC, please cleanup the /md directory to create sufficient space [local]Train-1# dir /md/*core Contents of /md/*.core -rw-r--r-- 1 root root 624241 Mar 27 13:13 20120327_131307_clsd.6154.1332853987.Ericsson.core.gz -rw-r--r-- 1 root root 386124 May 9 16:37 20120509_163729_ism2.3602.1336581449.Ericsson.core.gz [local]Train-1# [local]Train-1# del crashfile /md/20120327_131307_*.core.gz Are you sure you want to delete /md/20120327_131307_clsd.6154.1332853987.Ericsson.core.gz ?y [local]Train-1# › There is even more important reason for deleting old core dump files than disk space concerns Figure 3-20: Did a process crash? (2-2) The show crashfiles command is use to display the size, location, and name of any crash files located in the system. Files are placed in the /md partition in internal storage. Crash files are used by technical support to determine the cause of a system failure. LZT1381712 R1A © Ericsson AB 2015 - 87 - Ericsson SSR 8000 R15 System Troubleshooting 1.13 Old Core File on RP › Imagine you log in to your RP in the morning and type: [local]Train-1 # show crashfiles 11228124 May 5 11:40 /md/ribd_41.core 7628860 May 5 11:48 /md/arpd_28220.core 5514348 May 5 11:48 /md/arpd_11987.core 7410476 May 5 11:48 /md/loggd_2979.core 6157548 May 5 11:48 /md/arpd_12047.core 23848268 May 5 11:49 /md/rcm_39_sb.core 16987068 May 5 11:53 /md/snmpd_65.core 12412108 May 5 11:49 /md/bgpd_20293.core 16434108 May 5 11:50 /md/snmpd_15145.core 13014124 May 5 11:50 /md/bgpd_29315.core 16442300 May 5 11:50 /md/snmpd_15183.core 7583276 May 5 11:50 /md/arpd_24146.core 7062796 May 5 11:51 /md/arpd_20287.core › So you panic, call TAC, ask for chassis replacement ;-) and so on…. Figure 3-21: Old core files on RP – BAD IDEA 1.14 Core Files – Copied between RP › But if you check system log you will find something very interesting: May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11019]: connection from 127.3.252.1 May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11019]: FTP LOGIN FROM 127.3.252.1 as nobody May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11019]: put /md/arpd_28220.core = 7628860 bytes May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11022]: connection from 127.3.252.1 May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11022]: FTP LOGIN FROM 127.3.252.1 as nobody May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11022]: put /md/arpd_11987.core = 5514348 bytes May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11025]: connection from 127.3.252.1 May 5 11:48:44: %SYSLOG-6-INFO: ftpd[11025]: FTP LOGIN FROM 127.3.252.1 as nobody May 5 11:49:04: %SYSLOG-6-INFO: ftpd[11025]: put /md/loggd_2979.core = 7410476 bytes › › › All core files have been copied from standby RP after switch over occurred ftp does not preserve original time stamp We can check original file dates on standby RP Figure 3-22: Core files are copied between RPs - 88 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System 1.15 Core Dump Files on Standby RP [local]Train-1# dir mate /md Opening connection to mate... total 4128 -rw-r--r-- 1 root 0 11228124 Sep 16 -rw-r--r-- 1 root 0 7628860 May 30 -rw-r--r-- 1 root 0 5514348 Aug 6 -rw-r--r-- 1 root 0 7410476 Aug 7 --- cut --- › › › 2010 ribd_41.core 2010 arpd_28220.core 2011 arpd_11987.core 2011 loggd_2979.core Leaving core files on RP introduced problems: Increased operator’s adrenaline level and Diverted operator’s attention from real issue – › Core files are very old and most likely from different OS release instead of investigating real problem – reason for switchover – operator started to investigate multiple process crashes After sharing core dumps with TAC, please cleanup the /md directory on both RPs ! [local]Train-1# del mate /md/<filename> Opening connection to mate... Figure 3-23: Core dump files on standby RP LZT1381712 R1A © Ericsson AB 2015 - 89 - Ericsson SSR 8000 R15 System Troubleshooting 2 Exercise 3: Introduction › Exercise to learn to troubleshoot the system processes: – Search for processes indicating problems › Your system is not loaded so you will probably not find any problems with the processes. › You will emulate processes running on your system by looking at a saved file: [local]Train-1# show configuration sh_proc | grep ... Emulates “show process” command Figure 3-24: Exercise 3: Introduction 2.1 Exercise 3: System Processes › Please move to the exercises book. Figure 3-25: Exercise 3: System Processes 2.1.1 Exercise 3: Review “show conf sh_proc” emulates output of “show process” command › Exercise 3.1: Create a manual coredump [local]Train-1# › process coredump ppp no-restart Wait for coredump to complete: [local]Train-1# show process ppp NAME PID SPAWN MEMORY ppp 166 3 3436K [local]Train-1# › TIME 00:00:00.59 %CPU 0.00% STATE run UP/DOWN 00:02:20 process set ppp heart-beat on See the core dump file in /md: [local]Train-1# dir /md/*ppp*.core Contents of /md/*ppp* -rw-r--r-- 1 root root 399221 Aug 01 01:40 /md/20120801_014041_pppd.3908.1343785 241.Train-1.core.gz -rw-r--r-- 1 root root 293762 Aug 01 01:40 /md/20120801_014055_pppd.22065.134378 5255.Train-1.core.gz Exercise 3.2: Processes claiming more than 20% [local]Train-1# show conf sh_proc | grep option '-E' '[2-9][0-9]{1,2}\...%' ism 454 1 18280K 00:00:11.96 32.00% run 13:48:30 ppp 473 1 4364K 00:00:08.36 23.00% run 13:48:22 › Figure 3-26: Exercise 3, review (1-2) - 90 - © Ericsson AB 2015 LZT1381712 R1A Fundamental Concept of Processes Architecture on the System › Processes claiming any CPU resources [local]Train-1# show conf sh_proc | grep option '-E' '[1-9][0-9]{0,2}\...%' OR [local]Train-1# show conf sh_proc | grep option '-E -v' ' {1,3}0\...%' ism 454 1 18280K 00:00:11.96 32.00% run 13:48:30 rib 458 1 4540K 00:00:10.35 10.00% run 13:48:28 lm 466 1 4592K 00:00:07.74 12.00% run 13:48:25 ppp 473 1 4364K 00:00:08.36 23.00% run 13:48:22 aaad 480 1 6932K 00:00:15.83 7.00% run 13:48:18 › Processes which restarted more than once [local]Train-1# show conf sh_proc | grep option '-E' '^.{26}[2-9]' rcm 453 3 14060K 00:00:14.27 0.00% run dlm 488 2 6828K 00:00:07.17 0.00% run l2tp 482 3 4656K 00:00:33.39 0.00% run [local]Train-1# 13:48:31 13:42:18 13:48:16 Figure 3-27: Exercise 3, review (2-2) (Optional parts) LZT1381712 R1A © Ericsson AB 2015 - 91 - Ericsson SSR 8000 R15 System Troubleshooting 3 Chapter Summary After this course the participant should be able to: › Describe the Fundamental Concept of Processes Architecture on the System › Describe the SSR software architecture and system processes › Understanding the concept of manual core dump › Identify the different types of processes in SSR › Work with Core Dumps of Faulty Processes Figure 3-28: Chapter Summary - 92 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues 4 Understand the SSR System Redundancy Issues Chapter Objectives After this course the participant will be able to: › Identify the SSR System Redundancy Issues › Explain the redundancy on active RP › Analyze problems of standby RP › Understand RP Failover Management Figure 4-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 93 - Ericsson SSR 8000 R15 System Troubleshooting 1 RP Redundancy When checking the Health of an SSR an import thing to verify is the RP redundancy state. The SSR contains two RP’s, one RP is in Active state while the other one is in standby state. The standby RP is ready to take over should the Active RP fail. For this reason is very important to verify the state of this redundancy. › Verifying current redundancy state of the system [local]Train-1# show redundancy --------------------------------This RPSW is active --------------------------------STANDBY RPSW READY? : YES PAd in sync? : YES Database in sync? : YES Software Release in sync? : YES Firmware in sync? : YES Mate-to-Mate link up? : YES --- cut --[local]Train-1# › Thumbs up Those are software elements which are synchronized during an upgrade. During the boot sequence of the standby RP some fields might indicate “NO”. This is part of the boot sequence where each element is compared with the active RP Figure 4-2: RP redundancy This is done by using the command “show redundancy”. In this case we can see that the Standby RP is ready and in synched with the Active RP. We can also see a list of processes that have been successfully synched and also the details of any RP reload switchover that has occurred since system reload. - 94 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues 1.1 RP Redundancy Details Part of the information available from the output of “show system redundancy” is also available by running the command “show redundancy detail”. Here we can see again a side by side comparison of software releases on both RP’s. [local]Train-1# show redundancy detail Server (sync version3.0) is up Client (sync version3.0) is connected Client Mode: Service | Active's Version | Standby's Version ___________|_____________________________|_________________________________ Firmware | OpenFirmware 3.0.2.29 | OpenFirmware 3.0.2.29 | PRODUCTION RELEASE | PRODUCTION RELEASE ___________|_____________________________|_________________________________ Software | /p02: 15.2.129.3.13 | /p02: 15.2.129.3.13 ___________|_____________________________|_________________________________ Diagnostic | /p02: 15.2.129.3.13 | /p02: 15.2.129.3.13 ___________|_____________________________|_________________________________ VPF | /p02: | /p02: | SLES11SP3_ssc_xen-5.1.21 | SLES11SP3_ssc_xen-5.1.21 ___________|_____________________________|_________________________________ Minikernel | v3.0.38-876-g113fd53-2532251| v3.0.38-876-g113fd53-2532251 | | ___________|_____________________________|_________________________________ Software Sync Log: -----------------Release Sync Type: release sync unnecessary --more Dec 16 2015 05:58:17: SUCCESS Figure 4-3: RP redundancy details Additional output from the command can be seen here. This shows the logs of synchronisation between the RP’s. As we can see also the configuration files have to be synched for successful redundancy. LZT1381712 R1A © Ericsson AB 2015 - 95 - Ericsson SSR 8000 R15 System Troubleshooting 1.2 Investigating Redundancy Issues › “show system redundancy” command provides set of very useful information › It contains the following sections: – – – – – – – – – – – The output is large for this command Active controller alarms Hardware detail rp A & B Controller switch history Controller release sync history Firmware sync log Software sync log Configuration sync log Minikernel sync log Controller protection internal log Controller error log Figure 4-4: Investigating redundancy issues 1.3 Show system redundancy [local]Train-1# show system redundancy Controller alarms for slot RPSW2: ----------------------------- Alarms summary for RPSW Controller alarms for slot RPSW1: ----------------------------Hardware detail for slot RPSW2: --------------------------Slot : RPSW2 Serial No : CF90000AY0 Mfg Date : 06-DEC-2011 Activated Time : 44 h Phalanx : 2.0.9 --cut-Payload Status : OK POD Status : Passed Failed LED : Off Standby LED : Off Ejector Switch : 1 (Locked) Last Payld Reset : Admin Active Alarms : NONE Hardware detail for slot RPSW1: --------------------------Slot : RPSW1 Serial No : CF90000BSH Mfg Date : 22-DEC-2011 Activated Time : 28 min Phalanx : 2.0.9 Spanky : 02.02 --cut-Last Payld Reset : Admin Active Alarms : NONE ---(more)--- Type Hardware Rev : rpsw : R2H OSD Status : Not Run IS LED Swap LED : On : Off Type Hardware Rev : rpsw : R2H Same output as show hardware card RPSW2 detail Same output as show hardware card RPSW1 detail There is more next slide Figure 4-5: show system redundancy (1-3) - 96 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues Information about root cause of switch over Controller switch history: -------------------------[Sat Jun 16 04:01:07 2012] Card Failed : (RPSW1)->(RPSW2) Controller release sync status: ------------------------------Server (sync version3.0) is up Client (sync version3.0) is connected Client Mode: Service | Active's Version | Standby's Version ___________|_____________________________|_________________________________ Firmware | Mips,rev2.0.2.66 | Mips,rev2.0.2.66 ___________|_____________________________|_________________________________ Software | /p02: 11.1.2.1 | /p02: 11.1.2.1 ___________|_____________________________|_________________________________ Minikernel | 11.7 | 11.7 ___________|_____________________________|_________________________________ Same output as show redundancy detail Software Sync Log: -----------------Release Sync Type: release sync unnecessary Sep 7 2012 14:41:47: UNNECESSARY Sep 7 2012 14:41:47: SUCCESS Configuration Files Sync Log: ----------------------------Sep 7 2012 14:42:48: SUCCESS ---more--- There is more next slide Figure 4-6: show system redundancy (2-3) Controller protection internal log: ----------------------------------Sep 7 14:41:43: Controller::RPSW2 - Mate Link UP on ACTIVE card. synch steps and state changes of RPs including mate-to-mate link state change, SW state change Sep 7 14:41:43: Controller::RPSW2 - Sw State: [Running]. Mate Sw State [Startup] . Sep 7 14:41:52: Controller::RPSW2 - processMateInsertionStartup, received lock r equest from peer Sep 7 14:41:52: Controller::RPSW2 - Locking card for state synch Sep 7 14:41:53: Controller::RPSW2 - Sw State: [Running] -> [WaitForPeer]. Mate S w State [WaitForPeer]. Sep 7 14:42:20: Controller::RPSW2 - Exiting waitForPeerInit() Sw State: [WaitFor Peer]. Mate Sw State [ReadyToRun]. Sep 7 14:42:20: Controller::RPSW2 - Sw State: [WaitForPeer] -> [ReadyToRun]. Mat e Sw State [ReadyToRun]. Sep 7 14:42:52: Controller::RPSW2 - Sw State: [ReadyToRun] -> [Running]. Mate Sw State [ReadyToRun]. Sep 7 14:42:55: Controller::RPSW2 - Sw State: [Running]. Mate Sw State [ReadyToR un] -> [Running]. Sep 7 14:42:55: Controller::RPSW2 - Unlocking card after state synch Sep 7 14:42:55: Controller::RPSW2 - Fault Severity change on Primary. Card Fail -> No Faults. Root causes of RP errors No errors good Controller error log: --------------------- Figure 4-7: show system redundancy (3-3) LZT1381712 R1A © Ericsson AB 2015 - 97 - Ericsson SSR 8000 R15 System Troubleshooting 2 Analyzing Problems of Standby RP Figure 4-8: Analyzing Problems of Standby RP ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ - 98 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues 2.1 Active or Stanby RP Imagine active RP encountered serious problem and it restarted We analyzed possible restart problems of currently active RP This is very useful for single RP systems, however in case of redundant configuration Current active RP was in standby mode before problem occurred So we investigated perfectly healthy RP! › Imagine active RP encountered serious problem and it restarted › We analyzed possible restart problems of currently active RP › This is very useful for single RP systems, however in case of redundant configuration › Current active RP was in standby mode before problem occurred › So we investigated perfectly healthy RP! Important to remember – in case of RP problems on redundant systems most often you need to investigate the standby RP Active × Standby Standby Active Analyze active RP! Healthy RP! Figure 4-9: Which RP should you check, Active or Standby? 2.2 Connecting to Standby RP without Console › You can telnet from the active RP to standby RP using the internal loopback [local]Train-1# telnet mate Trying 127.2.252.1... Connected to 127.2.252.1.Escape character is '^]'. login: ericsson Password: ericsson [local]standby# › Once you are in the standby RP you can issue the same commands as discussed for active RP Figure 4-10: Connecting to standby RP without console LZT1381712 R1A © Ericsson AB 2015 - 99 - Ericsson SSR 8000 R15 System Troubleshooting 2.3 Searching for Restart Reason › Both RPs maintain independent logs › You should investigate output of the following command when searching for restart reason: [local]standby# show system redundancy [local]standby# show log file <filename> | grep ... Note! [local]standby# show redundancy This RPSW is standby [local]standby# Figure 4-11: Searching for restart reason 2.4 Repeating Commands on Standby RP › As discussed before on active RP [local]standby# show crashfiles [local]standby# show disk internal Filesystem 1k-blocks Used Available Use% Mounted on rootfs 3969036 1653376 2115624 44% / /dev/sda2 3969068 1659540 2109492 44% /p02 /dev/sdb1 15604376 418288 14399676 3% /var /dev/sda3 7665864 150652 7128872 2% /flash [local]standby# [local]standby# show history global Jul 4 17:29:00 show crashfiles Jul 4 17:29:21 show disk internal [local]standby# Figure 4-12: Repeating commands on standby RP › System statistics confirm processes are pending init mode because of standby function [local]standby# show system status System Status: OK [local]standby# Figure 4-13: Repeating commands on standby RP - 100 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues 2.5 Verify Processes on Standby RP › Verify the processes running on the standby RP [local]standby# show process (5 sec, 1 and 5 minute ) Load Average : 1.35 1.31 1.26 NAME ns u2l metad evtmd cmsp_sw0 cmsp_sw1 ---more--- PID 3185 3206 3207 3238 3253 3254 SPAWN 1 1 1 1 1 1 MEMORY 4932K 3800K 30712K 4108K 4328K 4324K TIME 00:00:00.50 00:00:00.02 00:00:01.51 00:00:00.03 00:00:00.05 00:00:00.05 %CPU 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% STATE run run run run run run UP/DOWN 00:13:03 00:13:03 00:13:03 00:12:59 00:12:59 00:12:59 •Spawn = 1 means good news •Spawn > 1 means investigate reason •Spawn = 0 means process is not popular •State = run / demand means good news IF up/down > days/weeks/months •Time is just indicator for total activity of process. If high it just means the process is popular •Load average provides indication of total utilization with different sample times (average time) Figure 4-14: Verify processes on standby RP 2.6 Copy Files from Standby RP › Why would you need to copy files from standby? › Because only the active RP has an active IP management connection toward the NOC [local]Train-1# ping 10.1.1.3 PING 10.1.1.3 (10.1.1.3): source 10.1.1.101, 36 data bytes, timeout is 1 second !!!!! [local]standby# ping 10.1.1.3 PING 10.1.1.3 (10.1.1.3): source 127.0.2.6, 36 data bytes, timeout is 1 second ..... ----10.1.1.3 PING Statistics---5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.205/0.347/0.790/0.250 ms [local]Train-1# ----10.1.1.3 PING Statistics---5 packets transmitted, 0 packets received, 100.0% packet loss Active rp confirms connectivity No response on standby Figure 4-15: Copy files from standby RP LZT1381712 R1A © Ericsson AB 2015 - 101 - Ericsson SSR 8000 R15 System Troubleshooting › Copy core dumps from the standby RP to active RP: [local]standby# exit [local]Train-1# copy mate /flash/delta.cfg /flash/deltafrommate.cfg copying from mate /flash/delta.cfg to local:/flash/deltafrommate.cfg... Opening connection... Copying file... /flash/deltafrommate.cfg: 50.00 B 336.96 B/s [local]Train-1# [local]Train-1# copy /flash/deltafrommate.cfg ftp: //admin1@10.1.1.3/ copying from mate /flash/deltafrommate.cfg to ftp://deltafrommate.cfg... Opening connection... Copying file... //deltafrommate.cfg: 50.00 B 336.96 B/s [local]Train-1# [local]Train-1# delete /flash/deltafrommate.cfg [local]Train-1# delete mate /flash/delta.cfg Figure 4-16: Copy files from standby RP - 102 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues 3 RP Failover Management Figure 4-17: RP Failover Management ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 3.1 Managing Reloads and RP Switch-over Two types of reload switchover may occur: LZT1381712 R1A © Ericsson AB 2015 - 103 - Ericsson SSR 8000 R15 System Troubleshooting The first type is a manual switchover and can be triggered by running the command “reload switch-over” The second type is due to automatic failover upon failure of the Active RP. For example the abnormal termination of a critical process running on the Active RP causes the RP to reload and as a consequence it triggers a switchover. Example of critical processes are PM, PAD, NS, CMS_SERVER. [local]Train-1# reload ? asp Reload asp(s) card Reload card(s) standby Reload the standby card switch-over Reload active RPSW or ALSW and cause standby to active switch over <cr> › reload – – Instructs the system to reload Accounting off records will be sent prior to reload › reload standby › reload card › – – Reload the standby RP Reload a specific traffic card reload switch-over – – Alternate Active to Standby RP Reloads current active RP Figure 4-18: Managing Reloads and RP Switch-over 3.2 Manual RP Switchover › BEFORE switchover, make sure RPs are in synch. They might not be and then you are in trouble…. › Lack of patience reason number one for causes out of synch on RPs [local]Train-1# show redundancy --------------------------------This RPSW is active --------------------------------STANDBY RPSW READY? : NO PAd in sync? : NO Database in sync? : NO Software Release in sync? : NO Firmware in sync? : NO Mate-to-Mate link up? : NO [local]Train-1# show redundancy --------------------------------This RPSW is active --------------------------------STANDBY RPSW READY? : YES PAd in sync? : YES Database in sync? : YES Software Release in sync? : YES Firmware in sync? : YES Mate-to-Mate link up? : YES [local]Train-1# [local]Train-1# NOT GOOD thumbs up Figure 4-19: Manual RP Switchover (1-2) - 104 - © Ericsson AB 2015 LZT1381712 R1A Understand the SSR System Redundancy Issues [local]Train-1# reload switch-over The "reload switch-over" command on this system will cause standby to active switch over, some cards may be rebooted. Do you really want to reload? (y/n) [local]Train-1# show chassis Current platform is SSR 8020 (Flags: A-Active Card B-Standby Card) Slot : Configured Type Installed Type Operational State Flags -------------------------------------------------------------------------RPSW1 : n/a rpsw OOS-Booting B RPSW2 : n/a rpsw IS A ALSW1 : n/a alsw IS A ALSW2 : n/a alsw IS B --- cut … [local]Train-1# show redund | grep NO STANDBY RPSW READY? : NO VxWorks in sync? : NO Database in sync? : NO Software Release in sync? : NO Firmware in sync? : NO [local]Train-1# Figure 4-20: Manual RP switchover (2-2) LZT1381712 R1A © Ericsson AB 2015 - 105 - Ericsson SSR 8000 R15 System Troubleshooting 4 Chapter Summary After this course the participant should be able to: › Identify the SSR System Redundancy Issues › Explain the redundancy on active RP › Analyze problems of standby RP › Understand RP Failover Management Figure 4-21: Chapter Summary - 106 - © Ericsson AB 2015 LZT1381712 R1A Issues related with Boot Problem 5 Issues related with Boot Problem Chapter Objectives After this course the participant will be able to: › Discuss the issues related with Boot Problem › Understanding the booting in SSR › Identify the issue related with booting in SSR Figure 5-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 107 - Ericsson SSR 8000 R15 System Troubleshooting 1 Boot Problems › In some cases there can be issues with the controller cards Active Standby – The reason could be corrupt OS image or hardware problem › This could result the route processor continuously rebooting › You can stop this by entering the boot ROM interface on the RPSW Reboot – Note! You can access the boot ROM interface only through the “CONSOLE port” – Caution! Do not change any system boot parameters unless instructed! It may cause non responsive system. Figure 5-2: Boot Problems 1.1 Entering Boot ROM Interface 1. Access the system through the CONSOLE port on the RPSW. – Usually this will be the standby. 2. Type “ssr” when you see the text: Auto-boot in 5 sec, type 'ssr' to abort, [CR] to boot: – You have 5 second so you need to be quick › You will be in “Boot ROM Interface”. Special commands apply here. – Note! You cannot change the configuration Figure 5-3: Entering Boot ROM Interface - 108 - © Ericsson AB 2015 LZT1381712 R1A Issues related with Boot Problem 1.2 Example: Entering Boot ROM Interface Start to reload system ... Sep 07 15:43:05 Initiated shutdown procedure. --- cut --Welcome to CodeGenInc SmartFirmware(tm) version 3.0 for x86_64 SmartFirmware(tm) Copyright 1996-2008 by CodeGen, Inc. All Rights Reserved. --- cut --Board voltage reads 54.0V Board current reads 0.96A Board power reads 51.9W Executing POST PASSED Loop 1 of 1, POST PASSED SHORT DRAM Test : PASSED PCI Devices Test : PASSED RTC Test : PASSED RTC battery check Test : PASSED Disk1 presence Test : PASSED Check EPROM Test : PASSED 2012/09/07 15:43:58 We entered “ssr” here Auto-boot in 5 sec, type 'ssr' to abort, [CR] to boot: Ok › “ok” is the prompt for the Boot ROM Interface › The booting has been stopped! Figure 5-4: Example: Entering Boot ROM Interface 1.3 Diagnostics Command Diagnostics are software tests that examine the hardware and operating system environment and detect malfunctions. The system supports the following automatic and user-initiated diagnostic tests for detecting problems related to hardware and software. › It is possible to run test of required level from the Boot ROM Interface: Diag command and ok diag options usage: diag <POST# | testname> <loopcount>: POST# Specify the level of Diags to run (1 to 2) * OR * Specify the testname to run, with is one of: set-led : X86RP DEV Port 80 led Level 0xF: ENABLED Run on boot : X86 BOOT L2 CACHE Level 0x9: ENABLED Run on boot : X86 OFW L2 CACHE Level 0x9: ENABLED Run on boot : DRAM ECC Level 0x9: ENABLED short-dram : SHORT DRAM Level 0x1: ENABLED long-dram : LONG DRAM Level 0x2: ENABLED post-pci : PCI Devices Level 0x1: ENABLED rtc-check : RTC Level 0x1: ENABLED --- cut --- Figure 5-5: Diagnostics Command LZT1381712 R1A © Ericsson AB 2015 - 109 - Ericsson SSR 8000 R15 System Troubleshooting 1.4 Running Diagnostics Run Diag level 1 ok diag POST1 SHORT DRAM Test : PASSED PCI Devices Test : PASSED RTC Test : PASSED RTC battery check Test : PASSED CPLD Register Test : PASSED Disk0 presence Test : PASSED Disk1 presence Test : PASSED Check EPROM Test : PASSED PASSED Loop 1 of 1, POST level 1 PASSED Report faults to Ericsson tech group 2012/09/14 15:46:03 Figure 5-6: Running Diagnostics - 110 - © Ericsson AB 2015 LZT1381712 R1A Issues related with Boot Problem 2 Troubleshooting Scenarios Figure 5-7: Troubleshooting Scenarios In case you need to resume booting. ok bootsys Launching OS kernel: load flash:0 /boot/bzImage Linux version 2.6.32.53-798-g5652359 (sysbuild@asglx-1-300) (gcc version 4.3.2 (Wind River Linux Sourcery G++ 4.3-85) ) #2 SMP PREEMPT Wed Apr 25 22:26:19 PDT 2012 Command line: console=ttyS0,9600n8 crashkernel=128m pci=hpmemsize=0,hpiosize=0 KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000c0000 - 00000000bf3db000 (usable) BIOS-e820: 00000000bf3db000 - 00000000bf427000 (ACPI NVS) BIOS-e820: 00000000bf427000 - 00000000bf42e000 (ACPI data) BIOS-e820: 00000000bf42e000 - 00000000bf45f000 (unusable) Figure 5-8: Resume Boot LZT1381712 R1A © Ericsson AB 2015 - 111 - Ericsson SSR 8000 R15 System Troubleshooting 2.1 Troubleshooting Scenarios › We will present some common examples of troubleshooting the SSR System in coming sections: 1. Analyzing problems of active RP 2. Investigating redundancy issues 3. Analyzing problems on standby RP 4. RP boot problems Figure 5-9: Troubleshooting Scenarios 2.2 System Uptime › The first question is how long has this RP been up [local]Train-1# show version Ericsson IPOS Version IPOS-15.2.129.3.13-Release Built by sysbuild@eussjlx7061.sj.us.am.ericsson.se Tue Oct 20 08:33:21 PDT 2015 Copyright (C) 1998-2015, Ericsson AB. All rights reserved. Operating System version is Linux 3.0.75-1281-gd853cba System Bootstrap version is OpenFirmware 3.0.2.29 PRODUCTION RELEASE Installed minikernel version is v3.0.38-876-g113fd53-2532251 ippmd / mloam-service-layer component version: 0.2-194-gf6ac77e Built by sysbuild@eussjlx7046.sj.us.am.ericsson.se Tue Jan 13 00:36:02 PST 2015 Copyright (C) 1998-2015, Ericsson AB. All rights reserved. Router Up Time - 26 minutes 19 seconds [local]Train-1# Figure 5-10: System uptime - 112 - © Ericsson AB 2015 LZT1381712 R1A Issues related with Boot Problem › What and when did somebody type something? [local]Train-1# show history global Dec 16 18:47:54 sho cont all Dec 16 18:47:56 sho ver Dec 16 18:48:17 sho chassis Dec 16 18:48:20 sh hard Dec 16 18:48:27 sho ver Dec 16 15:49:51 en Dec 16 15:49:52 ericsson Dec 16 15:49:59 show config Dec 16 15:50:01 config Dec 16 15:50:04 context local Dec 16 15:50:06 int 1 Dec 16 15:50:11 ip add 10.1.1.106/24 Dec 16 15:50:13 exit Dec 16 15:50:21 administrator ericsson password ericsson Dec 16 15:50:25 privilege start 15 Dec 16 15:50:29 port eth 7/1 ---more--- Anything you typed and did not get rejected by CLI parser. Covers both execute commands (show) and configuration commands Although we trust everybody its good to check for possible human errors [local]Train-1# show history global | grep reload Dec 16 15:58:18 reload card 5 [local]Train-1# Figure 5-11: Check for human errors 2.3 System Storage Verification › Verification of errors and free space within storage media [local]Train-1# show disk internal detail Manufacturer : SMART (/dev/sda) Model : eUSB Serial Number : 1E884210130309181111 No disk errors! Manufacturer Model Serial Number Disk usage OK! Filesystem rootfs /dev/sda2 /dev/sdb1 /dev/sda3 [local]Train-1# : SMART (/dev/sdb) : eUSB : 3E3B2F12132259181111 1k-blocks 3969036 3969068 15604376 7665864 Used Available Use% Mounted on 1726740 2042260 46% / 916612 2852420 24% /p02 1307320 13510644 9% /var 150644 7128880 2% /flash Figure 5-12: System storage verification 2.4 Exercise 4: Investigate Boot Problems › Please move to the exercises book. Figure 5-13: Exercise 4: Investigate Boot Problems LZT1381712 R1A © Ericsson AB 2015 - 113 - Ericsson SSR 8000 R15 System Troubleshooting 3 Chapter Summary After this course the participant should be able to: › Discuss the issues related with Boot Problem › Understanding the booting in SSR › Identify the issue related with booting in SSR Figure 5-14: Chapter Summary - 114 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 6 Active and History Logs in SSR Chapter Objectives After this course the participant will be able to: › Describe the log in SSR › Understanding the Active and History Logs › Discuss the different type of logs in SSR › Understanding the communication with Syslog Server › Discuss the concept of communication with Syslog Server › Configure communication to a Syslog server Figure 6-1: Chapter Objective LZT1381712 R1A © Ericsson AB 2015 - 115 - Ericsson SSR 8000 R15 System Troubleshooting 1 System Logging Introduction In many cases system troubleshooting takes place after a problem has occurred and we need to figure out what the problem was and how it occurred. For this we typically need to access historical data and this is where System Logging is useful. In SSR there is a System Logger which collects information from any process which has information which needs to be logged, and writes this information to a log buffer for future reference. The buffer used is a Circular 1 Mb buffer meaning that when it fills older entries are overwritten. › Troubleshooting: – Often after problem occurred › Logs: historical information › System logger: collects information from multiple sources › Storage of log messages ? System logs Troubleshooting – /md/loggd_dlog.bin Figure 6-2: System logging introduction - 116 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 1.1 Loggd Process The System Logging on SSR is done by the Logger daemon (or Loggd) Process. The Loggd process is broken into three different log types. Main System Logs, System Debugs and Malicious Packet Logs and each of these have their own memory buffers which is limited to approximately 1mpb in size. The Loggd Process runs on all RPSW cards as well as Line Cards. Remote Loggd Processes (for example those running on the line card or standby RP) send the log messages to the loggd running on the active RP, so the active RP log buffer contains all the necessary logged messages. Logger Daemon (Loggd) Active RPSW LOG Debug STANDBY RPSW MAL PKT 1 Mb Buffer Line Card Figure 6-3: Loggd Process It is the System Main Logs which are used for historical troubleshooting as it is data from this buffer that can be saved for future analysis by writing the data to files. Certain Debug messages may also be saved which we see later. LZT1381712 R1A © Ericsson AB 2015 - 117 - Ericsson SSR 8000 R15 System Troubleshooting 1.2 System log Commands The simplest way to check current logs is to use the “show log” command. This displays the contents of the Loggd Buffer on the Active RP. As mentioned logs from other cards in the chassis are sent to this log process. Log messages typically take always the same format. The first part contains the Timestamp of the event This is followed by the application which has written the log event. The next part contains a numeric value which indicates the severity of the event and its description or condition that caused the message to be logged. Finally we can read the content of the Log Message. Many log messages are normal and do not indicate a system problem while others may be critical. Looking at current log events [local]Ericsson# show log Dec 16 12:44:41: %IPC-3-ERR: loggd: ipc_sendto sendto errno 2: No such file or directory Dec 16 12:44:41: {6/LP}: %SRVFWD-ESSIF-6-CTLD_INFO: essifcd_sc_notify_appvm_start():463: AppVM is alive Dec 16 12:44:41: %IPC-3-ERR: loggd: ipcSendCommon: sendto rc=-3 Dec 16 12:44:41: %IPC-3-ERR: loggd: ipcContactPM: ipcSend(NS) err=-16 Dec 16 12:44:41: %ISP-6-INFO: [isp_heartbeat_register] is called on ACTIVE -more-- Timestamp Severity severity level application Log Message 0 Emergencies 1 Alerts [local]Ericsson# show log card ? 1..20 RPSW1..RPSW2 all 2 Critical 3 Errors slot number all slots ` 4 Warnings 5 Notifications 6 Informational 7 Debugging Figure 6-4: System log commands - 118 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 1.3 Event Severity Levels in log Messages Figure 6-5: Event Severity Levels in Log Messages There are 8 different Severity levels. Severity Seven Logs are those used for debugging and must be enabled manually by the user. This is done through system debugging which we will see in the next section, For this reason the Logger Daemon Process only Messages from 0 to 6 by default. As mentioned earlier there is a logger daemon process on each card and if we want to display only the Logs for a specific card we can use the “Show log card” command. As you can see there are options to display logs for either RP as well as cards from each slot on the SSR. 1.4 Logs from Cards As mentioned each RP and Line Card has Log Process and they communicate to the Active RP Log Process. Line Cards send their log messages to the Log Process on the Active RP by default, so the Active RP log buffer contains all the logs of Line Cards. The only exception to this would be some logs on Card reload before communication is established with RP. We can identify a log message coming from another card because in this case, after the timestamp, we have a message indicating the component the log was generated from. LZT1381712 R1A © Ericsson AB 2015 - 119 - Ericsson SSR 8000 R15 System Troubleshooting In this case these messages are sent from the Line Processor on Card in slot 3. Also standby RP can send its log messages to the Active RP and viceversa. However this has to be configured from global config mode using the commands “logging standby” and “logging active” [local]Ericsson# show log startup Ericsson Log line cards send logs to active RP by default Ericsson IPOS Context ID 0x40080001 Dec 16 05:59:26:{3/LP}: %FABRICD-6-INFO: Enable WLCFAP IRQ. Dec 16 05:59:26:{3/LP}: %FABRICD-6-INFO: Enable FAP.0 IRQ. Dec 16 05:59:26:{3/LP}: %FABRICD-6-INFO: IPC Event: FAPFMA_EVENT_IPC_FMM_BIRTH Dec 16 05:59:26: %PAD-6-INFO: SVC - proc_asgSl_card_boot_events():885: slot 3, ASG_SL_CARD_INIT_PASSED received Dec 16 05:59:26: %PAD-6-INFO: SVC - slMakeEvent_asg_cb():504: slot 3, Card_Boot_Event :6, image 0, source:1 Dec 16 05:59:26: %PAD-6-INFO: Card activation completed on slot 3 Dec 16 05:59:26: {3/LP}: %FABRICD-6-INFO: Sync Type: FAPFMA_FMR_SYNC Dec 16 05:59:26: {3/LP}: %CAD-6-INFO: caCdlNpuUpdateInitPhase: All drivers have completed initialization. Making transition to Ready. [local]Ericsson(config)# logging ? active Configure to log active event to standby controller cct-valid Configure to log only event with valid cct cmd-audit Configure to log commands debug Configure to log debug events standby Configure to log standby event to active controller timestamp Configure the timestamp information of log [local]Ericsson(config)# Figure 6-6: Logs from cards 1.5 Show log and time [local]Ericsson# show log active all since 2015:06:23:21:54:17 Jun 23 21:54:35.086: %PAD-6-INFO: virtual bool PktBaseEPortMgr::setPortOperation(EnableDisable): Port 1/14, enableDisable=ENABLE Jun 23 21:54:35.086: %PAD-6-INFO: caPktPortEnable(1/14) Jun 23 21:54:35.195: %APP-6-INFO: submitting alarm, major: 193, minor: 1, dn: ManagedElement=1,Equipment=1,Slot=0,Port=13, severity: 3, text: Link down , time: 1340468651 (in applibcm_svr_cfg_event_callback) Jun 23 21:54:35.807: %CSM-6-PORT: ethernet 1/14 link state UP service state UP, overall admin is UP Jun 23 21:54:35.811: [0002]: %VRRP-5-STATE_CHANGE: VRRP router SS7_vrrp_1/151 state change from Init to Backup due to event Interface Up Jun 23 21:54:35.811: [0003]: %VRRP-5-STATE_CHANGE: VRRP router sr_om_1_sw01/150 state change from Init to Backup due to event Interface Up Jun 23 21:54:35.811: [0004]: %VRRP-5-STATE_CHANGE: VRRP router SR_GB_1_Sr1/10 state change from Init to Backup due to event Interface Up Jun 23 21:54:38.201: %APP-6-INFO: submitting alarm, major: 193, minor: 1, dn: ManagedElement=1,Equipment=1,Slot=0,Port=13, severity: 0, text: Link down , time: 1340468677 (in applibcm_svr_cfg_event_callback) Figure 6-7: Show log and time A very useful facility when using log files is to search based on time. In many cases we know when a certain event occurred and wish to analyze the logs around that time. - 120 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR In this case, by using the option “since” , we show events logged since the specified time [local]Ericsson# show log active all since 2015:06:23:21:54:17 until 2012:06:23:21:55 Jun 23 21:54:35.086: %PAD-6-INFO: virtual bool PktBaseEPortMgr::setPortOperation(EnableDisable): Port 1/14, enableDisable=ENABLE Jun 23 21:54:35.086: %PAD-6-INFO: caPktPortEnable(1/14) Jun 23 21:54:35.195: %APP-6-INFO: submitting alarm, major: 193, minor: 1, dn: ManagedElement=1,Equipment=1,Slot=0,Port=13, severity: 3, text: Link down , time: 1340468651 (in applibcm_svr_cfg_event_callback) Jun 23 21:54:35.807: %CSM-6-PORT: ethernet 1/14 link state UP service state UP, overall admin is UP Jun 23 21:54:35.811: [0002]: %VRRP-5-STATE_CHANGE: VRRP router SS7_vrrp_1/151 state change from Init to Backup due to event Interface Up Jun 23 21:54:35.811: [0003]: %VRRP-5-STATE_CHANGE: VRRP router sr_om_1_sw01/150 state change from Init to Backup due to event Interface Up Jun 23 21:54:35.811: [0004]: %VRRP-5-STATE_CHANGE: VRRP router SR_GB_1_Sr1/10 state change from Init to Backup due to event Interface Up Jun 23 21:54:38.201: %APP-6-INFO: submitting alarm, major: 193, minor: 1, dn: ManagedElement=1,Equipment=1,Slot=0,Port=13, severity: 0, text: Link down , time: 1340468677 (in applibcm_svr_cfg_event_callback) Figure 6-8: Show log and time You can also specify a particular time window while looking through the logs. Here we see an example of show log between two specific times. We simply have to add the option “until” to the previous command. These commands save us from going through unnecessary logs entries. 1.6 Log Files The operating system contains two log buffers: main and debug. By default, messages are stored in the main log. If the system restarts, for example as a result of a logging daemon or system error, and the logger daemon shuts down and restarts cleanly, the main log buffer is saved. The Main Log buffer is a circular buffer and therefore after a while all logs are overwritten. For this reason some log information is also stored in files for future access. LZT1381712 R1A © Ericsson AB 2015 - 121 - Ericsson SSR 8000 R15 System Troubleshooting Logs stored in files › /md/loggd_dlog.bin LOG Logger Daemon Restarts Log Messages › /md/loggd_startup.log › /md/loggd_startup.log1 Severity 0,1,2,3,4,5,6 Log Messages › /md/loggd_persistent.log › /md/loggd_persistent.log1 › /md/loggd_persistent.log2 › /md/loggd_persistent.log3 Severity 0,1,2,3 Figure 6-9: Log Files First of all if the logger daemon shuts down or restarts cleanly, the contents of main log buffer is saved in the loggd_dlog.bin file stored in the /md directory. This is useful to preserve logs across System Reload. As well as this there are two predefined log files which are created. The first of these is the Startup Log. A Startup log is created for every reboot of SSR. This contains all logs since the last startup and contains logs of severity 0 to 6. The file is constantly written to and does not not wrap around. This means the Startup log will always contain logs since the last system startup. The file does however have a limited size of around 10 Mega bytes and once this is reached is no longer written to. Two startup log files are stored. The current startup log loggd_startup.log and the startup log before the last system reload loggd_startup.log.1. The second predefined log file is the Persistent Log file . Persistent Logs are logs that are not lost on system reload. Persistent Log files are not rotated and are written to continuously but only Error and More Severe Log Messages are written. Up to 4 Persistent log files are stored and these fill up to a maximum of around 10Mb each. When the persistent log reaches its maximum it is moved to loggd_persistent.log1 and a new loggd_persistent file is created and so on. Persistent Logs contain logs of severity 0 to 3. Unlike startup logs, these Logs are not rotated on reload and are written to until the max size is reached. This is true for all RPs and Line cards. We can find these files in the /md directory of each of these cards. - 122 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 1.6.1 Custom Log Files and Filters Users may also generate their own custom log files using the “logging file” command from context-config mode. This file is stored by default in the directory /md. The file can then be customized to contain logs up to a particular severity using the option filter. The logging display filter can also be applied for the console, monitoring terminal and syslog server. [local] Ericsson(config-ctx)# logging file MYLOG.log [local] Ericsson(config-ctx)# logging filter file ? alert Log alert and more severe events (priority 1) critical Log critical and more severe events (priority 2) debug Log all events, including debug (priority 7) emergency Log only emergency events (priority 0) error Log error and more severe events (priority 3) informational Log informational and more severe events (priority 6) notice Log notice and more severe events (priority 5) warning Log warning and more severe events (priority 4) [local] Ericsson(config-ctx)# logging filter ? console Configure logging display filter for the console file Configure logging display filter for file monitor Configure logging display filter for monitoring terminal syslog Configure logging display filter for syslog server Figure 6-10: Custom Log fIles and filters LZT1381712 R1A © Ericsson AB 2015 - 123 - Ericsson SSR 8000 R15 System Troubleshooting 1.6.2 Log Files Location Both default and custom files are saved in the /md folder. In this example we can see the default files and the custom file we created earlier. [local] Ericsson# cd /md Current directory is now /md [local] Ericsson# dir Contents of /md/ total 184484 -rw-r--r-- 1 root root 16 Jun 23 02:23 loggd_ddbg.bin -rw-r--r-- 1 root root 777848 Jun 23 02:23 loggd_dlog.bin -rw-r--r-- 1 root root 5081846 Jun 23 06:21 loggd_persistent.log -rw-r--r-- 1 root root 9751679 Jun 22 22:59 loggd_persistent.log.1 -rw-r--r-- 1 root root 9751727 Jun 18 01:16 loggd_persistent.log.2 -rw-r--r-- 1 root root 9751661 Dec 15 22:00loggd_persistent.log.3 -rw-r--r-- 1 root root 9751660 May 24 21:02 loggd_persistent.log.4 -rw-r--r-- 1 root root 9751711 May 17 02:31 loggd_persistent.log.5 -rw-r--r-- 1 root root 289774 Jun 23 06:20 loggd_startup.log -rw-r--r-- 1 root root 262601 Jun 23 05:01 loggd_startup.log.1 -rw-rw-r-- 1 root root 948 Dec 16 12:56 MYLOG.log Figure 6-11: Log Files location Both default and custom files are saved in the /md folder. In this example we can see the default files and the custom file we created earlier. 1.6.3 Display Log Files [local] Ericsson# show log file loggd_persistent.log.5 Ericsson Log Ericsson IPOS Context ID 0x40080001 Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUPPORT-3-INTERNAL_ERR: fwd_al_adj_create_raw, Error Code: FW D_AL_ERROR_CIRCUIT_NOT_FOUND, Adj_id: 0x98e342;Adj_cookie: 0;Circuit handle: 5/1:511:63:31/1/2/304107;Port: 1;MTU: 1500;Encap lengt h: 0;Encap s Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUPPORT-3-INTERNAL_ERR: fwd_al_circuit_control_pkts, Error Co de: FWD_AL_ERROR_CIRCUIT_NOT_FOUND, Circuit handle: 5/1:511:63:31/1/2/304107;FABL_API_MODULE_ID:IFACE; Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUPPORT-3-INTERNAL_ERR: fwd_al_circuit_mac_config, Error Code : FWD_AL_ERROR_CIRCUIT_NOT_FOUND, Circuit handle: 5/1:511:63:31/1/2/304107; Mac: 00:02:3b:04:57:68; Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUPPORT-3-INTERNAL_ERR: fwd_al_circuit_down, Error Code: FWD_ AL_ERROR_CIRCUIT_NOT_FOUND, Circuit count: 1; Circuit List 5/1:511:63:31/1/2/304107; Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUPPORT-3-INTERNAL_ERR: fwd_al_circuit_mac_config, Error Code : FWD_AL_ERROR_CIRCUIT_NOT_FOUND, Circuit handle: 5/1:511:63:31/1/2/304107; Mac: 00:02:3b:04:57:68; Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUPPORT-3-INTERNAL_ERR: fwd_al_circuit_create, Error Code: FW D_AL_ERROR_INSUFFICIENT_MEMORY, Circuit handle: 5/1:511:63:31/1/2/304108;MTU: 1500;IPv6 MTU: 1500;Parent circuit: 5/1:511:63:31/1/2 /303308; Sep 23 16:08:18: %LOG-6-SEC_ACTIVE: Sep 23 16:08:18: {5/LP}: %FABL-ALDSUP ---more Figure 6-12: Display Log Files - 124 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 1.7 Filter Based on Facility When debugging it is very useful to be able to only look at logs from a particular facility. This is done using the “show log fac” command. This option can be applied to any type of log, like for example a log file. [local] Ericsson# show log fac ? aaa amcm aos app arp asesdk asm aspha atm bgp bot --more-- AAA facility AMC Manager facility AOS facility Application facility ARP facility ASESDK facility Remote mini-CSM facility ASP HA Manager facility ATM facility BGP facility SSC File Manager facility [local] Ericsson# show log file loggd_startup.log fac ? aaa amcm aos app arp asesdk asm aspha … AAA facility AMC Manager facility AOS facility Application facility ARP facility ASESDK facility Remote mini-CSM facility ASP HA Manager facility Figure 6-13: Filter Based on Facility 1.7.1 Filter Based on Facility Example In this example we want to get only logs related to authorization, authentication and accounting from the current event log. As we can see only logs generated by the “aaa” application are displayed. [local] Ericsson# show log active fac aaa Jun 23 05:06:24.312: %AAA-6-INFO: Perform non hitless switchover. Jun 23 05:06:33.612: %AAA-5-NOTICE: [local] administrator: (test) logged in via tty: /dev/pts/, host: 155.53.235.45 Jun 23 05:06:59.042: %AAA-5-NOTICE: [local] administrator: (test) logged in via tty: /dev/pts/, host: 155.53.235.45 Jun 23 06:20:59.931: %AAA-5-NOTICE: [local] administrator: (test) logged in via tty: /dev/pts/, host: 155.53.234.42 Jun 23 06:26:15.024: %AAA-5-NOTICE: [local] administrator: (test) logged in via tty: /dev/pts/, host: 155.53.235.45 Jun 23 07:07:51.776: %AAA-5-NOTICE: [local] administrator: (test) found on /dev/pts/4 from 155.53.235.45 - record as logged out. Jun 23 07:48:54.559: %AAA-5-NOTICE: [local] administrator: (test) found on /dev/pts/1 from 155.53.235.45 - record as logged out. Jun 23 07:48:56.397: %AAA-5-NOTICE: [local] administrator: (test) found on /dev/pts/2 from 155.53.235.45 - record as logged out. Figure 6-14: Filter Based on Facility example LZT1381712 R1A © Ericsson AB 2015 - 125 - Ericsson SSR 8000 R15 System Troubleshooting 1.8 PM Process Logs One Process whose logs should be of particular interest is the Process Manager or pm. This process monitors and controls the operation of all other processes and has IPC connections to each. If a process is having a problem, PM will log this. In this case we have simulated a crash of the RCM process. When we look at the pm logs we can see info related to process rcm dying and restarting. PM will also show information about any RPSW Switchover event. RCM Process Crash PM: process manager [local] Ericsson# show log active fac pm Jun 23 21:43:51.474: %PM-6-PROCDIE: rcm is dying, pid 3982 Jun 23 21:43:54.474: %PM-5-GEN: restarting <rcm> now RPSW Switchover [local] Ericsson# show log active fac pm Jun 23 05:06:22.206: %PM-5-GEN: PM received ACTIVE event Jun 23 05:06:22.206: %PM-5-GEN: Set PM to run in primary mode. Jun 23 05:06:22.206: %PM-5-GEN: This RP is going Active. Jun 23 05:06:22.206: %PM-5-GEN: Setting PM as primary. Jun 23 05:06:22.217: %PM-5-GEN: Reason for controller switch: Card Failed Jun 23 05:06:22.218: %PM-6-INFO: pm_send_status: Notifying ns Jun 23 05:06:22.218: %PM-6-INFO: pm_send_status: Notifying rpsw_dtp Figure 6-15: Pm Process Logs - 126 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 1.9 CSM Process Logs Another useful Process is the Card State Manager process. This is responsible to process events for cards and ports and may be useful in troubleshooting issues with ports not coming up. [local] Ericsson# show log active fac csm Jun 23 05:05:15.504: %CSM-6-CARD: card ge-40-port INSERTED in slot 1 READY Jun 23 05:05:15.505: %CSM-6-CARD: card ge-40-port INSERTED in slot 17 READY Jun 23 05:05:15.506: %CSM-6-CARD: card alsw INSERTED in slot ALSW1 Jun 23 05:05:15.506: %CSM-6-CARD: card alsw INSERTED in slot ALSW2 Jun 23 05:05:15.506: %CSM-6-CARD: card sw INSERTED in slot SW1 Jun 23 05:05:15.507: %CSM-6-CARD: card sw INSERTED in slot SW2 Jun 23 05:05:15.507: %CSM-6-CARD: card sw INSERTED in slot SW3 Jun 23 05:05:15.507: %CSM-6-CARD: card sw INSERTED in slot SW4 Jun 23 05:05:24.765: %CSM-6-PORT: ethernet 1/11 link state UP service state UP, overall admin is UP Jun 23 05:05:24.765: %CSM-6-PORT: ethernet 1/12 link state UP service state UP, overall adminis UP Jun 23 05:05:24.765: %CSM-6-PORT: ethernet 1/14 link state UP service state UP, overall adminis UP Jun 23 05:05:39.570: %CSM-6-CARD: slot PM5, ALARM_CLEARED: Input Failure - Both Feeds --More-- Figure 6-16: CSM Process Logs 1.10 ISM Process Interface and Circuit State Manager process logs can also be useful for reporting information about links going up and down or reload switchovers. [local] Ericsson# show log active fac ism Jun 23 05:05:04.586: %ISM-6-STATE_TOGGLE: This ISM going standby. Jun 23 05:05:04.607: %ISM-6-CHKPT_OK: Marked ISM checkpoint as OK Jun 23 05:05:04.796: %ISM-6-PPA_REG1: Switchover is complete and can process PPA registration now. Jun 23 05:06:22.224: %ISM-6-STATE_TOGGLE: This ISM going active. Jun 23 05:06:22.226: %ISM-6-SWOVR_TYPE: Performing *** NON HITLESS *** switchover. All dynamic and subcribers circuits will be deleted. Jun 23 05:06:22.309: %ISM-6-SENT_IPC: Sent RESYNC ipc to component: CSM. Jun 23 05:06:24.311: %ISM-6-SENT_EVENT: Sent event: XC RESYNC, to MBE: dot1q Jun 23 05:06:24.311: %ISM-6-SENT_EVENT: Sent event: XC RESYNC, to MBE: aaa Jun 23 05:06:24.484: %ISM-6-SENT_EVENT: Sent event: XC DONE, to MBE: aaa Jun 23 05:06:24.484: %ISM-6-SENT_IPC: Sent XC DONE ipc to component: ifmgr. Jun 23 05:06:24.484: %ISM-6-SENT_EVENT: Sent event: XC DONE, to client: snmp Jun 23 05:06:24.485: %ISM-6-PPA_REG1: Switchover is complete and can process PPA registration now. Jun 23 05:09:23.292: %ISM-6-SB_RDY_SWOVR: Standby ISM is ready for switchover Figure 6-17: ISM Process LZT1381712 R1A © Ericsson AB 2015 - 127 - Ericsson SSR 8000 R15 System Troubleshooting 1.11 Filter Based on Facility on Card As each Card has its own logging facility, the commands mentioned can also be applied to individual cards. [local]SSR8020# show log card 3 fac pm -------------------------------------------------------------Slot number : 3/LP Card Type : ge-40-port Aug 19 16:46:59: {3/LP}: %PM-6-INFO: All run processes initialized Aug 19 16:47:59: {3/LP}: %PM-6-INFO: Declaring system healthy [local]SSR8020# show log card 3 fac ns -------------------------------------------------------------Slot number : 3/LP Card Type : ge-40-port Aug 19 16:46:47: {3/LP}: %NS-6-INFO: New namespace 'RP.ACTIVE' from ep [127.2.253.1:6001|000|003] Aug 19 16:46:48: {3/LP}: %NS-6-INFO: New namespace 'RP.STANDBY' from ep [127.2.252.1:6001|000|003] Aug 20 00:39:41: {3/LP}: %NS-6-INFO: New namespace 'LC.05' from ep [127.2.4.1:6001|000|003] Figure 6-18: Filter based on facility on card - 128 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 1.12 Logger Verification This command displays some state info and statistics for the main log buffer (Log) and the debug log buffer (Dbg) The 'Logger Buffer Locked' line indicates whether the log buffer is currently locked. While the buffer is locked, all msgs will be dropped. this lock should be very transient. if a buffer is locked, repeat the 'show logging' command several times, waiting a few seconds between each repeat. if a buffer is 'stuck' in the locked state this is very likely a bug. The 'Logged msg' line is a count of the number of msgs that have been inserted into the log buffers. The 'Logger Drop Counter' section lists counts for any dropped msgs, by component. › Logging is a process and one can verify the state of the logger system [local]Train-1# show logging % Logging Information % =================== % Logger Uptime : 09:37:56 Wed Dec 16 2015 % Logger Buffer (KB) : Log: 981, Dbg: 1023 % Logger Buffer Locked : Log: N, Dbg: N % # Logged msg : Log: 343, Dbg: 0 % # Logged Filtered : Log: 0, Dbg: 0 % # Logged Rate Limited : Log: 0, Dbg: 0 % ================== % Logger Drop Counter : All drop counters are all ZERO [local]Train-1# Log: Main log buffer Buffer not locked – no performance problems Good news - no dropped log messages Dbg: Debug log buffer Figure 6-19: Logger verification LZT1381712 R1A © Ericsson AB 2015 - 129 - Ericsson SSR 8000 R15 System Troubleshooting 1.13 Show Logging Card Information This command is also applicable to individual cards. [local] Ericsson# show logging card 1 -------------------------------------------------------------Slot number : 1/LP Card Type : ge-40-port % Logging Information % =================== % Logger Uptime : 02:10:50 Wed Dec 16 2015 % Logger Buffer (KB) : Log: 979, Dbg: 1023 % Logger Buffer Locked : Log: N, Dbg: N % # Logged msg : Log: 431, Dbg: 0 % # Logged Filtered : Log: 0, Dbg: 0 % # Logged Rate Limited : Log: 0, Dbg: 0 % ================== % Logger Drop Counter : All drop counters are all ZERO Figure 6-20: Show Logging Card information To reduce the number of informational messages displayed on the console, changes were introduced in Release 12.1 to suppress the default display of INFO messages on the console and terminal connections. By default, these messages are no longer displayed, but they are still stored in the system log buffer. 1.14 Logging Display Info [local]Train-1# logging display-info from release 12.1 level 6 info messages not [local]Train-1# terminal monitor displayed by default [local]Train-1# conf Enter configuration commands, one per line, 'end' to exit [local]Train-1(config)# port eth 2/8 [local]Train-1(config-port)# no shut [local]Train-1(config-port)# commit Transaction committed. Feb 6 15:37:34: %CSM-6-PORT: ethernet 2/8 link state UP service state UP, overall admin is UP [local]Train-1(config-port)#shut [local]Train-1(config-port)#commit Feb 6 15:38:03: %CSM-6-PORT: ethernet 2/8 link state DOWN service state DOWN, overall admin is DOWN Feb 6 15:38:03: %CSM-6-PORT: ethernet 2/8 link state down, trigger source: Configuration changed [local]Train-1(config-port)# end [local]Train-1# no logging display-info [local]Train-1# conf [local]Train-1(config)# port eth 2/8 use of this command is discouraged [local]Train-1(config-port)# no shut [local]Train-1(config-port)# commit [local]Train-1(config-port)# For a Line Card: logging card slot display-info Figure 6-21: Logging display info However, if you want to display INFO messages (for example, for script purposes), you can enable them by entering the hidden command logging display-info for the RP CLI logs. In the following example we enable the command “logging display-info” and then we enable a port so to generate a Level 6 log message. As we can see the Level 6 message is displayed . - 130 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR We check again by disabling the port. Level 6 messages are displayed. However is important to note that use of this command is discouraged because it can result in a large number of undocumented messages displayed on the console. To disable the display of INFO messages on the console, use the no form of the commands. If now we try to enable the port again we will not see any Level 6 message. Finally if you want to display INFO messages for a line card you have to use the command logging card slot display-info, where slot is the slot number where the card is hosted. LZT1381712 R1A © Ericsson AB 2015 - 131 - Ericsson SSR 8000 R15 System Troubleshooting 1.15 Logging Debug As we have already said there are two separate buffers for log and debug messages. If the logger daemon shuts down and restarts cleanly, the contents of log buffer is saved in the /md/loggd_dlog.bin file while the debug buffer is saved in the /md/loggd_ddbg.bin file . Because of this separation, by default , you can not use the command “show log” to display the contents of debug messages However we can use the command “logging debug” from “global config mode” to send debug messages to the log buffer › /md/loggd_dlog.bin LOG Debug › /md/loggd_ddbg.bin › “show log” displays only content of log messages by default › “logging debug” sends debug message to log buffer Figure 6-22: Logging debug The command “logging” from “global config mode” contains several options. We have already talked about “logging active” and “logging stanby”. By using “logging active”, Active RP sends its log messages to Standby RP, while by using “logging standby” Standby RP sends its log messages to Active RP. In a similar way, when we use “logging debug”, debug logs are sent to the log buffer and from here to the log files. However, it is important to note that “logging debug” sends to Log Buffer only events which are displayed to either console or terminal screen - 132 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR [local]Train-1(config)#logging ? active Configure to log active event to standby controller cct-valid Configure to log only event with valid cct debug Configure to log debug events standby Configure to log standby event to active controller timestamp Configure the timestamp information of log [local]Train-1(config)# Log file Log file Event ….. Event ….. Event …. Event ….. Event ….. Event …. Logging debug Terminal / console Term monitor or Logging console Log engine Log engine Standby RPSW Active RPSW Debug engine Logging debug ONLY sends events which are actually displayed to either console or terminal screen Figure 6-23: Logging debug (global config logging) In the following example we enable debug for static rib. Initially “logging debug” is not configured. We add a static route to generate a debug message. However this message can’t be seen in the output of “show log”. Now we enable “logging debug” and add another static route. Now this message can be seen in the output of “show log” because it has been sent from the debug to the log buffer. Finally, note that we have enabled terminal monitor because as we have said before only debug logs which are displayed are sent to the Log Buffer LZT1381712 R1A © Ericsson AB 2015 - 133 - Ericsson SSR 8000 R15 System Troubleshooting 1.16 Logging Debug [local]Train-1# debug static rib [local]Train-1# terminal monitor [local]Train-1# conf [local]Train-1(config)# context local [local]Train-1(config-ctx)# ip route 11.12.12.0/24 8.8.8.8 [local]Train-1(config-ctx)# commit Feb 7 15:20:23: %STATIC-7-RIB: register nexthop: 8.8.8.8, context 0x40080001, nexthop_afi 0, metric 4294967295, ifgrid 0x0, default 0, magic 0, bfd-disabled [local]Train-1(config-ctx)# end [local]Train-1# show log | grep "STATIC-7“ [local]Train-1# conf [local]Train-1(config)# logging debug [local]Train-1(config)# context local [local]Train-1(config-ctx)# ip route 11.12.13.0/24 8.8.8.9 [local]Train-1(config-ctx)# commit Feb 7 15:32:29: %STATIC-7-RIB: register nexthop: 8.8.8.9, context 0x40080001, nexthop_afi 0, metric 4294967295, ifgrid 0x0, default 0, magic 0, bfd-disabled [local]Train-1(config-ctx)# end [local]Train-1# show log | grep "STATIC-7" Feb 7 15:32:29: %STATIC-7-RIB: register nexthop: 8.8.8.9, context 0x40080001, nexthop_afi 0, metric 4294967295, ifgrid 0x0, default 0, magic 0, bfd-disabled Figure 6-24: Logging debug 1.17 Log File Collection › In R14A release, logs can be collect system from both the active and standby controller card and the line cards. › the “save tech-support log” command is used. – Collects logs for the active controller card, for specific cards, for the SSC1-A host, or for all controller cards and line cards. – Stores them in the /md directory on the active controller card. › Command Syntax: – save tech-support log [card {slot | all | standby | host}] Figure 6-25: Log File Collection - 134 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 2 Syslog Configuration Figure 6-26: Syslog Configuration ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ LZT1381712 R1A © Ericsson AB 2015 - 135 - Ericsson SSR 8000 R15 System Troubleshooting 2.1 Syslog Server The SSR OS contains two log buffers: main and debug By default, generic messages are stored to the main log At restart: log buffers are save to the /md/loggd_dlog.bin for the main log buffer, and the /md/loggd_ddbg.bin for the debug log buffer. › Log messages can be sent to a remote Syslog server – Convenient for large installations where multiple systems can log to a remote server for centralized management › Log messages can be sent to multiple Syslog servers per context › Messages sent can be filtered to be a certain severity level and higher Figure 6-27: Syslog server 2.2 Exercise 5: Logging & Syslog Management side Subscriber Circuit: •Ethernet Port Session will be •PPPoE Management local xyz Syslog Subscriber Circuit: •ATM PVC Backbone Session will be •PPPoE Subscriber side SSR Backbone side Figure 6-28: Reference for Syslog lab › Please move to the exercises book. Figure 6-29: Exercise 5: Logging & Syslog - 136 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 2.2.1 Exercise Review: Configure Syslog & Debug Transport side Tasks; 1) Logging syslog 2) Logging debug management logging debug 2 local Context GX_xyz logging syslog IP# facility localX X = group number IP# = Assigned by the instructor GX_xyz 1 Ethernet 3/5 As a result, each context will log a different facility label to its Syslog server › Messages from context GX_xyz have facility label localX READY Figure 6-30: Exercise review: Configure Syslog & Debug 2.2.2 Exercise Review: Syslog Server Environment Note: To view SYSLOG events on the lab server, telnet to SSH server public IP address, log in with student credential, and view the log file. For example: tail -f /var/log/HOSTS/10.1.1.10X/messages Lab Note: Instructors, you may need to change the permission of the hostname folder (ex 10.1.1.101) because when the SSR OS first creates this folder for syslog messages, it is only available to root. Change the folder to give permission for all users. Refer to instructor’s guides for this course for details. LZT1381712 R1A © Ericsson AB 2015 - 137 - Ericsson SSR 8000 R15 System Troubleshooting › Start debug for aaa [G1_xyz]Train-1# debug aaa all › For each SSR source IP address, the syslog server will store the logging output in /var/log/HOSTS/<hostname> › In our example: [student@ssh1-gothenburg-1]# tail -f /var/log/HOSTS/10.1.1.101/messages Nov 7 13:21:38 [local6.notice] Nov 7 13:36:06: %AAA-5-NOTICE: [local] administr ator: (ericsson) logged in via tty: /dev/ttyp0, host: 10.1.1.3 Nov 7 13:21:41 [local6.notice] Nov 7 13:36:09: %AAA-5-NOTICE: [local] administr ator: (ericsson) found on /dev/ttyp0 from 10.1.1.3 - record as logged out. --- cut --- Figure 6-31: Exercise review: Syslog server environment 2.2.3 Exercise Review: Save and Display the Logs › Save (active) logs to the file “ericsson.log”: [local]Train-1# save log text ericsson.log › Display the content of this file: [local]Train-1#[local]Train-1# show log file ericsson.log --- cut --Dec 5 11:12:07.227: %DLM-6-INFO: Standby RPSW's /flash may not be in sync Dec 5 11:12:07.573: %DLM-6-INFO: Standby RPSW synced /flash successfully Dec 5 11:12:10.953: {1/LP}: %CAD-5-NOTICE: caLcSmSetState: Old State: Booting (1) Dec 5 11:12:10.954: {1/LP}: %CAD-5-NOTICE: caLcSmSetState: New State: Running (2) Dec 5 11:12:11.338: %CSM-6-PORT: ethernet 1/1 link state UP service state UP, overall admin is UP › Display the active logs for aaa only: [local]Train-1# show log active fac aaa Nov 7 13:11:26: %AAA-5-NOTICE: [local] administrator: (ericsson) found on /dev/ttyp0 from 10.1.1.3 - record as logged out. Nov 7 13:15:24: %AAA-5-NOTICE: [local] administrator: (ericsson) logged in via tty: /dev/ttyp0, host: 10.1.1.3 Nov 7 13:32:24: %AAA-5-NOTICE: [local] administrator: (ericsson) found on /dev/ttyp0 from 10.1.1.3 - record as logged out. --- cut --- Figure 6-32: Exercise review: Save and display the logs - 138 - © Ericsson AB 2015 LZT1381712 R1A Active and History Logs in SSR 3 Chapter Summary After this course the participant should be able to: › Describe the log in SSR › Understanding the Active and History Logs › Discuss the different type of logs in SSR › Understanding the communication with Syslog Server › Discuss the concept of communication with Syslog Server › Configure communication to a Syslog server Figure 6-33: Chapter Summary LZT1381712 R1A © Ericsson AB 2015 - 139 - Ericsson SSR 8000 R15 System Troubleshooting Intentionally Blank - 140 - © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System 7 Use and Impact of Debugging on the SSR System Chapter Objectives After this course the participant will be able to: › Describe the Use and Impact of Debugging on the SSR System › Understand the SSR system debug structure › Identify the SSR debug process Figure 7-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 141 - Ericsson SSR 8000 R15 System Troubleshooting 1 Debug Introduction When working on SSR it is useful to do system debugging while events take place in order to analyze what exactly is occurring on the system. Debug is one of the most powerful troubleshooting tools in the system. Debug helps zooming in and identifying failures. There are some important facts to know before start debugging. Debug is resource intensive. It utilizes system memory and CPU. As such Debugging should be used with caution. When debug is enabled, the debug messages of a process may be competing with other process messages that are critical for the process to run. So while other processes are not effected, the debug may impact this process at the point where the Process manager may fail to receive heartbeat and cause the process to restart. Resource Intensive! › Debug – troubleshooting tool Important facts: › Debug: last resort! › Structured searching › What to debug? – port, routing ... – System wide debug – Context specific debug port context local Last resort! What function to debug? › Where to start debug? – Contexts are autonomous › Display debug output to screen › Administrator privacy ABC System wide debug: debug aaa authen Context specific debug: debug ospf lsdb XYZ Which context to start debug from? Figure 7-2: Debug introduction Thus use debug as a last resort while troubleshooting. Use various show commands, logs, alarms and similar tools before using debug. Debug can generates massive output. Having a structured way of searching using debug will help you minimize system downtime and operational costs. We recommend following basic steps for structured searching. Failure could occur in different system components, for example port, context and so on. It is recommended to focus on what function to debug. - 142 - © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System It is important to remember that SSR uses contexts, which is like having multiple routers on the same system. Because of this, some debug functions are System Wide, for example aaa authentication, While Other debugs are Context Specific, for example debugging an ospf link-state database which applies to a specific context in which the ospf process may run. As there are different instances of ospf in different contexts, running the debug command in different contexts will result in different outputs. Its is important for this reason to know what context debugs should be run in. Context are autonomous routing environments. This means you can have many different routers within the system. Once you have multiple contexts you need to decide where you are going to start your debug functions. IPOS in SSR gives the option to view debug events per context, and also option to view debug event for all contexts. › Contexts are autonomous › This means you can have many different routers within the SSR › Once you have multiple contexts you need to decide where you are going to start your debug functions › OS gives the option to view debug events Context local Context abc Context 123 – Per context, and – For all contexts SSR Figure 7-3: The challenge LZT1381712 R1A © Ericsson AB 2015 - 143 - Ericsson SSR 8000 R15 System Troubleshooting 1.1 Debug Coverage The SSR software supports multiple contexts. Each context is an instance of a virtual router that runs on the same physical device. A context operates as a separate routing and administrative domain with separate routing protocol instances, addressing, authentication, authorization, and accounting. A context does not share this information with other contexts. › Debug functions on SSR can be divided into 2 categories: › Context specific – they display debug information specific to given context only – Example: Debug ospf lsdb (routing) is considered context specific since you could have multiple contexts, each running their own ospf instance › System wide – they display the same information regardless of context they were started in – Example: Debug aaa authen (negotiation room) is considered system wide since the negotiation room is actually located on port or circuit level and is not associated with a context Figure 7-4: Debug coverage (what) There are two types of contexts: local (a system-wide context) and administratordefined (a nonlocal context). The active context (the context that you are in) affects your debug output. To debug all contexts on your router, use the system-wide local context. You see debug output related to this context and all contexts running on the router. For example, to see all Open Shortest Path First (OSPF) instances on the router, issue the debug ospf lsdb command in the local context. [local] Ericsson# debug ospf lsdb When you debug a local context, the software displays debug output for all contexts. When a debug function is context specific, the debug output generated by the local context includes a context ID that you can use to determine the source of the event (the context in which the event has its origin). Context-specific debugging refers to navigating to a specific context and running debug commands from it and filtering out all debug output that is not related to that context. Context-specific output consists of lines of output identified by a context ID in brackets, which can be displayed either using context-specific debugging or system-wide debugging. - 144 - © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System 1.2 How to Recognize a Debug Function [NiceService]Train-2# Dec 16 05:59:26: [0002]: [13/1:1:63/1/2/11]: %AAA-7-AUTHOR: aaa_idx 1000001e: Dec 16 05:59:26: [0002]: [13/1:1:63/1/2/11]: %AAA-7-AUTHOR: aaa_idx 1000001e: Dec 16 05:59:26: [0002]: [13/1:1:63/1/2/11]: %AAA-7-AUTHOR: aaa_idx 1000001e: Context identifier Internal Circuit handle Debug function [local]Train-2# Dec 16 05:59:26: [13/1:1:63/1/2/11]: %AAA-7-AUTHEN: aaa_idx 1000001f: Dec 16 05:59:26: [13/1:1:63/1/2/11]: %AAA-7-AUTHEN: aaa_idx 0: Missing context identifier (means this type of debug is system wide) Context identifier included (means this type of debug is context specific) [local]Train-1# Dec 16 05:59:26: [0002]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Router LSA Dec 16 05:59:26: [0003]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.2 Update Router LSA Dec 16 05:59:26: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.3 Update Sum-Net Figure 7-5: How to recognize a debug function is context specific? Context ID After some seconds we get the output shown here. The output includes debug messages from different contexts. How do we know this? There is a context identification number for each outputted row. If we type “show context all” we will see a list of all contexts in the system and their id-s. From this we can match the debug output to a context. This context id confirms that debug function we started is context specific. In summary: What debug function did we start? “OSPF LSDB” which is context specific. And where did we start it from? From context local. Context local is “capture all” and as we expect we see output from all context running ospf. LZT1381712 R1A © Ericsson AB 2015 - 145 - Ericsson SSR 8000 R15 System Troubleshooting › › › Depending on the debug function the location where debugging is started will make a difference When debugging, consider the debug function and its relationship to a context To allow a “system wide capture” the context local is enabled as “capture all” context for context specific debug – – In case debugging function is indeed context specific, the debug output generated by context local will include a context identifier allowing the operator to understand the “source” of the event Debug ospf lsdb (routing) within context local will include all ospf instances within the SSR Figure 7-6: Debug coverage (where) 1.3 Debugging Within Context Local [local]Train-1# show debug Example Messages only OSPF: lsdb debugging is turned on [local]Train-1# Dec 16 05:59:26: %LOG-6-SEC_STANDBY: Dec 16 05:59:26:%CSM-6-PORT: ethernet 3/7 link state UP, admin is UP Dec 16 05:59:26: %LOG-6-SEC_STANDBY: Dec 16 05:59:26:%CSM-6-PORT: ethernet 3/8 link state UP, admin is UP Dec 16 05:59:26: %CSM-6-PORT: ethernet 3/7 link state UP, admin is UP Dec 16 05:59:26: %CSM-6-PORT: ethernet 3/8 link state UP, admin is UP Dec 16 05:59:26: [0002]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Router LSA 200.1.1.1/200.1.1.1/80000013 cksum 26f1 len 72 Dec 16 05:59:26: [0003]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.2 Update Router LSA 200.1.2.1/200.1.2.1/80000009 cksum ce79 len 36 Dec 16 05:59:26: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.3 Update Sum-Net LSA 0.0.0.0/200.1.3.1/80000001 cksum bb74 len 28 Dec 16 05:59:26: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.3 Update Router LSA 200.1.3.1/200.1.3.1/8000000a cksum 142 len 36 Dec 16 05:59:26: [0004]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Router LSA 200.1.1.1/200.1.1.1/80000013 cksum 26f1 len 72 Dec 16 05:59:26: [0003]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Router LSA 200.1.1.1/200.1.1.1/80000013 cksum 26f1 len 72 Dec 16 05:59:26: [0005]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update Router LSA 2.2.2.2/2.2.2.2/8000000a cksum 983b len 36 Dec 16 05:59:26: [0006]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.2 Update Router LSA 2.2.2.6/2.2.2.6/80000009 cksum 7c4e len 36 Dec 16 05:59:26: [0007]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.3 Update Router LSA 2.2.2.10/2.2.2.10/8000000a cksum 803f len 36 Dec 16 05:59:26: [0005]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update AS-Ext LSA 30.1.1.4/2.2.2.2/80000001 cksum 2821 len 36 Dec 16 05:59:26: [0005]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update AS-Ext LSA 2.2.2.0/2.2.2.2/80000001 cksum a6c0 len 36 Dec 16 05:59:26: [0005]: %OSPF-7-LSDB: OSPF-1: Area 0.0.0.0 Update AS-Ext LSA 30.1.1.0/2.2.2.2/80000001 cksum 50fc len 36 ---more--- [local]Train-1# show context all Context Name Context ID VPN-RD Description -----------------------------------------------------------------------------local 0x40080001 Rb-1 0x40080002 Rb-2 0x40080003 Rb-3 0x40080004 Re-1 0x40080005 Re-2 0x40080006 Re-3 0x40080007 [local]Train-1# To generate this debug output we performed port down/up Figure 7-7: Debugging within context local 1.4 Debugging in Different Contexts Let us look at another example with debug from different contexts. First we start “debug aaa authorization” and verify that the debug has started by typing show debugging. We start the debug from context NiceService. If we look at the output we notice that there is context id number “0002” which indicates that aaa authorization is “context specific” debugging function. - 146 - © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System In summary: what debug function did we start? Aaa authorization which is context specific [NiceService]Train-2# show debugging AAA: authorization debugging is turned on exception debugging is turned on [NiceService]Train-2# When looking from within context NiceService only authorization and exception debugging output will be shown [NiceService]Train-2# Dec 16 05:59:26:[0002]: [13/1:1:63/1/2/11]: %AAA-7-AUTHOR: aaa_idx 1000001e: unprovision attr 13 Dec 16 05:59:26: [0002]: [13/1:1:63/1/2/11]: %AAA-7-AUTHOR: aaa_idx 1000001e: aaa_ip_addr_prov: rem pool entry 0x64010117 Dec 16 05:59:26: [0002]: [13/1:1:63/1/2/11]: %AAA-7-AUTHOR: aaa_idx 1000001e: unprovision attr 3 [local]Train-2# show debugging AAA: authentication debugging is turned on exception debugging is turned on [local]Train-2# When looking from within context local only authentication and exception debugging output will be shown [local]Train-2# Dec 16 05:59:26:: [13/1:1:63/1/2/11]: %AAA-7-AUTHEN: aaa_idx 1000001f: Received SESSION_DOWN msg extern_handle 0 Dec 16 05:59:26:: [13/1:1:63/1/2/11]: %AAA-7-AUTHEN: aaa_idx 0: Received AUTHEN_REQUEST msg from PPPd for username user2@NiceService with external handle = 0 The examples above are based on system wide debug functions. Hence if one would enable the same type of debugging in both contexts, the output would be the same Figure 7-8: Debugging in different contexts And where did we start it from? From context NiceService Which means that we only see authorization debugging from context NiceService. We also see exception debugging which is automatically started whith aaa authorization. In the second exercise we start “aaa authentication debug” from context local and verify which debug is turned on. We do not see any context id in the output which indicates that aaa authentication is a system wide debug function. When looking from within context local only authentication and exception debugging output will be shown. In the examples above, each context has a different debug function enabled. Depending on which context the admin is monitoring from, the debug output will be different. LZT1381712 R1A © Ericsson AB 2015 - 147 - Ericsson SSR 8000 R15 System Troubleshooting 1.5 Debug Relationship with Contexts Debug on SSR Context Context specific debug functions can be looked at from two levels: SSR System System wide debug functions can be looked at from two levels: Debug within context local Debug within a context Debug within context local another Context Context local another Context Context local You only see debug output related to the context You will see debug output related to all contexts You would see all output You would see all output Debug within the context No difference between the two levels…. Figure 7-9: Debug relationship with contexts Let us look at the relationship between context where we start debug and debug function. We want to select the debug function and also we want to choose which context to start the debug from. As we mentioned debug function in SSR can be divided into Context specific debug functions which display debug information specific to given context only and 1. Context specific debug functions The context specific debug function can be looked at from two levels. • If you enable the debug function within a specific context other then context local, you will see the output related only to that context. • However if you enable a context specific debugging function from context local, you will see the output related to all contexts. Debug in context local has a “capture all” effect. 2. System wide debug functions - 148 - • If you enable the system wide debug function from either context local or a non-local context, you would see the debug output for the whole system. • There is no difference between the debug outputs for the two levels in this case. © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System System wide debug functions which display the same information regardless of context they were started in. The context specific debug function can be looked at from two levels Starting debug within a specific context other then context local. You will only see debug output related to that context. However if you start a context specific debugging function from context local you will see debug output related to all contexts. Debug in context local has “capture all” effect. Also system wide debug functions can be started from two levels: If you start system wide debug from a specific context other then context local you would see all debug output. If you start system wide debug from context local you would see all debug output. There is no difference between the debug output for the two levels. 1.6 Send Debug Output to Screen Another important thing to remember is that when we start debugs we must also redirect the output to the terminal we are using to log into the system. By default when you start debug you will not see anything. Let us see how to display debug output to your terminal screen. Displaying output to the session depends on how you are logged in to the system: thru console port or thru telnet/ssh session. When connected to the console port you need to enable logging to the console. › When connected to the craft port: – › You need to enable logging to the console console [local]Train-1# config [local]Train-1(config)# context local [local]Train-1(config-ctx)# logging console [local]Train-1(config-ctx)# › repeat for each context where debug output needs to be generated When connected via Telnet or SSH: – – You need to redirect debugging output to your terminal: [local]Train-1# terminal Monitor to pause debug output – any key to continue: telnet / ssh [local]Train-1# CTRL-S Figure 7-10: Send debug output to screen Enter the context configuration mode for the context from which you want to start the debug. LZT1381712 R1A © Ericsson AB 2015 - 149 - Ericsson SSR 8000 R15 System Troubleshooting Type logging console. Remember to commit. Repeat for each context where debug output needs to be generated When connected via Telnet or SSH you need to redirect debugging output to your terminal. From administrator monitoring mode type terminal monitor. Repeat for each context where debug output needs to be generated. To pause debug output press control-S. Press any key to continue showing the debug output Finally it’s important to know that each administrator that is logged in to the system can start its own debug function and will have unique destination for debug output. They will not influence each other. Displaying Debug Output through the Console Port Use the logging console command in context configuration mode to view event log messages on the console. By default, this is enabled in the local context. [local]Ericsson#config Enter configuration commands, one per line, 'end' to exit [local]Ericsson(config)#context local [local]Ericsson(config-ctx)#logging console Displaying Debug Output through Telnet or SSH Use the terminal monitor command in exec mode to view event log messages on your terminal when you are connected through Telnet or SSH. To pause debug output at your terminal, type Ctrl+S. To continue, type Ctrl+C. [local]Ericsson# terminal monitor - 150 - © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System 1.7 Administrator Privacy As we have already seen in the previous examples, we can display the status of system debugging using the command “show debugging”. <This will show us what debugs are currently active in the context from which we run the command. To turn off all debugs we use the command ‘no debug all’ As we can verify , this turns off all debugging in the current context. › › Each administrator is treated within the SSR as unique destination for debugging output Each administrator can start its own debugging functionality without influencing other administrators › Enabling debugging is context specific and requires › Disabling debugging is context specific and requires [local]Train-1# debug [function] [local]Train-1# no debug [function] [local]Train-1# no debug all › will disable all debug functions in one step › Disconnecting the telnet / SSH session will be handled as implicit “no debug all” for associated administrator Figure 7-11: Administrator privacy LZT1381712 R1A © Ericsson AB 2015 - 151 - Ericsson SSR 8000 R15 System Troubleshooting 1.8 Debugging and Impact •Keep alive timer configured within process › › › › › •Process Manager learns this value to check Debugging is started within a process status process Output is sent to “logger” process [local]Train-1# debug PPP Debugging will share the time slice with its own process Worse case: its own primary process Process Manager will slow down and perhaps not respond anymore to PM keep KeepAlive alive… Causing the BSD kernel to restart PPP the process. But most important… AAA Logger NO Impact on traffic card state table debug Restart PPP Please restart PPP process BSD Kernel Figure 7-12: Debugging and “impact” 1.9 Exercise 6: Debugging on SSR › Please move to the exercises book. Figure 7-13: Exercise 6: Debugging on SSR - 152 - © Ericsson AB 2015 LZT1381712 R1A Use and Impact of Debugging on the SSR System 2 Chapter Summary After this course the participant should be able to: › Describe the Use and Impact of Debugging on the SSR System › Understand the SSR system debug structure › Identify the SSR debug process Figure 7-14: Chapter Summary LZT1381712 R1A © Ericsson AB 2015 - 153 - Ericsson SSR 8000 R15 System Troubleshooting Intentionally Blank - 154 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 8 Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces Chapter Objectives After this course the participant will be able to: › Perform Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces › Explain the traffic flow in SSR System › Identify the Connectivity Issue and Troubleshooting Figure 8-1: Chapter Objectives LZT1381712 R1A © Ericsson AB 2015 - 155 - Ericsson SSR 8000 R15 System Troubleshooting 1 Troubleshooting Basic Checks Figure 8-2: Troubleshooting Basic Checks ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ - 156 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 1.1 Interface & Port States In SSR an interface and a port represent two separate, distinct entities and interfaces need to bound to physical ports in order to pass traffic. For this reason we have different states defined for an interface and for a port. An interface state can take three possible values: Unbound, Bound and Up. While a port can be in any of the following 3 states: Unconfigured, Down and Up Interfaces and ports different entities on SSR They have distinct states Binding local Port 1/1 VLAN ABC Interface Figure 8-3: Interface & Port States LZT1381712 R1A © Ericsson AB 2015 - 157 - Ericsson SSR 8000 R15 System Troubleshooting Interfaces and ports different entities on SSR They have distinct states Interface Three states for a port Three states for an interface Unbound Bound Port/circuit Up Unconfigured Down Up Figure 8-4: Interface & Port States Interfaces and ports different entities on SSR They have distinct states Interface Three states for a port Three states for an interface Unbound Bound Up Port/circuit Unconfigured Down Up Line Admin Line Admin Down Down Up Up Down Up Configured Port States Figure 8-5: Interface & Port States Finally, let’s see the interaction between a port and an interface state: A port state is never affected by the interface state of an interface bound to it. However, an interface state may be affected by a port state. In fact, if an interface is bound to a port, the state of the port will determine the state of the interface. - 158 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces Interfaces and ports different entities on SSR Distinct states Interface Three states for a port Three states for an interface Unbound Bound Up Bound-interface state determined by port state Port/circuit Unconfigured Up Down Line Admin Line Admin Down Down Up Up Down Up Bound Interfaces States Configured Port States Figure 8-6: Interface & Port States In particular, if the port, to which the interface is bound, is in Down State, the interface will be in Bound State. But in case the port, to which the interface is bound, is in Up State, the interface will be in Up State as well. 1.2 Verifying Interface Status › The first basic step is verifying IP interfaces of your router [local]Train-1# show ip interface brief Wed Jul 4 22:33:08 2013 Name Address e0 1.1.1.1/30 e1 1.1.1.5/30 e2 1.1.1.9/30 [local]Train-1# MTU 1500 0 1500 State Up UnBound Bound Bindings dot1q 5/1 vlan-id 10 ethernet 3/4 › Two things to check are: – – State Binding shows if interface is operational shows which physical circuit this interface uses to forward traffic Figure 8-7: Verifying interface status show ip interface [if-name | all-context | brief [all-context] | rp] Displays information about interfaces, including the interface bound to the Ethernet management port on the controller card. Use the show ip interface command to display information about all interfaces, including those on the controller card. Use this command without optional syntax to display detailed information on all configured interfaces. An interface can be in any of the following states: LZT1381712 R1A © Ericsson AB 2015 - 159 - Ericsson SSR 8000 R15 System Troubleshooting Unbound—The interface is not currently bound to any port or circuit. Bound—The interface is bound to at least one port or circuit. However, none of the bound circuits are up; therefore, the interface is not up. Up—At least one of the bound circuits is in the up state; therefore, the interface is also up and traffic can be sent over the interface. 1.3 Identifying Interface Problems: Unbound State Let’s start troubleshooting an interface in ‘Unbound’ State. Interface “e1” has been created within context “local” but its not connected to any port, so it’s in Unbound state. [local]Train-1# show ip interface brief Wed Jul 4 22:33:08 2013 Name Address e0 1.1.1.1/30 e1 1.1.1.5/30 e2 1.1.1.9/30 [local]Train-1# MTU 1500 0 1500 State Up UnBound Bound Bindings dot1q 5/1 vlan-id 10 ethernet 3/4 › Interface e1 is in “UnBound” state – There is no physical circuit attached to interface Context local Port eth 5/2 No binding present e1 Interface › This is configuration error › Binding has to be configured between some physical circuit (like port or vlan) and interface e1 port ethernet 5/2 no shutdown bind interface e1 local Figure 8-8: Identifying interface problems: Unbound state (1-3) As a further confirmation we can also check the configuration of the ports on the system to see if any circuit or port binds to e1 using the command “show configuration port” and looking for interface test1. As you can see there are no bindings listed for interface e1. Another way to get to the same conclusion is by using the command “show binding bound” that displays info for bound circuits and look for interface e1. To fix this problem we need to bind interface “e1” to a physical port or circuit. Then, based on the state of the port we bind the interface to, the interface will get in either Bound or Up state. In this case we bind interface e1 on context test directly to Port 5/2 and not to a particular circuit within that port. Now, if we run a “show ip interface brief” we can see the state of the interface has changed from Unbound to Up. Furthermore we can see to which port we connected this interface to. - 160 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces Since the interface moved to Up state, this means that the port we bound the interface to is in Up state as well. Let’s verify this by running the command “show port 5/2” As expected the port is in Up state. [local]Train-1# show ip interface brief Wed Jul 4 22:33:08 2007 Name Address e0 1.1.1.1/30 e1 1.1.1.5/30 e2 1.1.1.9/30 [local]Train-1# MTU 1500 0 1500 State Up UnBound Bound Bindings dot1q 5/1 vlan-id 10 ethernet 3/4 › Interface e2 is in “Bound” state – This confirms configuration has been done properly › The problem exists on L1/L2 level › Investigation needs to be continued on port level Figure 8-9: Identifying interface problems: Bound state (2-3) Let’s now see how to troubleshoot an interface in ‘Bound’ State. First we bind interface “e2” to a circuit on port 3/4. By checking the Bindings field from the output of the command “show ip int brief” we can confirm this interface is now bound to the right circuit on port 3/4. We can also see that the interface is in Bound state. This means that the port we bound the interface to is in Down state. Let’s verify this by running the command “show port 3/4” As expected the port is in Down state. At this point we have to troubleshoot why port 3/4 is in Down state. As we have seen earlier the state of a port, either Up or Down, is determined by a combination of two other underlying states: Admin and Line states. These two states are associated respectively to Layer1 and Layer2 problems. LZT1381712 R1A © Ericsson AB 2015 - 161 - Ericsson SSR 8000 R15 System Troubleshooting [local]Train-1# show ip interface brief Wed Jul 4 22:33:08 2013 Name Address e0 1.1.1.1/30 e1 1.1.1.5/30 e2 1.1.1.9/30 [local]Train-1# [local]Train-1# show port 3/4 Slot/Port:Ch:SubCh Type 3/4 ethernet [local]Train-1# MTU 1500 0 1500 State Up UnBound Bound Bindings dot1q 5/1 vlan-id 10 ethernet 3/4 State Down › Port 3/4 is in “Down” state › There are 2 possible reasons for that: – Physical layer is down – Port has been administratively shut down Figure 8-10: Identifying interface problems: Bound state (cont.) (3-3) - 162 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 1.4 Port Status: Admin State and Line State • Unconfigured— means that the port is not configured • Down— means the port is configured but in Down State • Up— means the port is configured and in Up State To check the state of all port in the system we can use the command ‘show port all’. In this example we are showing only three ports from the output and as we can see each of them is in a different state. To unconfigure a port you can simply type “no port ethernet” followed by the port number from configuration mode. To configure a port we run the command “port ethernet” followed by the port number from configuration mode. The second port in Down state. This state is determined by a combination of two underlying states, Line and Admin. If the port is in Down state it means Line and Admin states are in either one of these two combinations of state : Down-Down or Down-Up. The third port is configured and in Up state.If the port is in Up state it means both Line and Admin states are in Up state. [local]Train-1# show port 3/4 detail › Most important is to realize the difference between Admin State and Line State Admin State Line State (configuration) (physical) Result Down Down Down Down Up Down Up Down Down Up Up Up ethernet 3/4 state is Down Description : Port circuit : 3/4:511:63:31/1/0/30 Link state : Up Last link state change : Nov 12 07:41:14.117 Line state : Down Admin state : Up Link Dampening : disabled Undampened line state : Up Dampening Count : 0 Encapsulation : ethernet MTU size : 1500 Bytes NAS-Port-Type : none NAS-Port-Id : none MAC address : 00:30:88:19:71:84 Media type : 100Base-T Flow control : on Speed : 100 Mbps Duplex mode : full Loopback : off Mini-RJ21 Connector : Ports 1-12 Support Lossless-Large-MTU : Not Configurable Active Alarms : Link down Figure 8-11: Port status: Admin state and Line State LZT1381712 R1A © Ericsson AB 2015 - 163 - Ericsson SSR 8000 R15 System Troubleshooting 1.5 Circuit Status Now that we have seen port and interface states, let us talk about circuit state. Usually Circuit States follow the state of their parent port. However some circuits can be down even when parent port is up - for example a PPP circuit may be brought down on failure to receive keep-alives whereas its parent port may be up. It is also possible to administratively shut down a circuit while keeping the Port up by using the ‘shutdown’ command under the PVC configuration. For example individual Dot1q Circuits may be administratively shut down on a port. This allows us to shut down one vlan without shutting down the whole port. Using the Show Circuit command we can look at the dot1q Circuits and see their individual states. › Usually circuits state follows state of their parent – port › Some circuit types have their own keepalive mechanism and can be brought down regardless of port state up › It is also possible to administratively shutdown most of circuit types [local]Train-1(config-atm-pvc)#? -- cut – shutdown Shutdown the PVC [local]Train-1(config-dot1q-pvc)#? -- cut – shutdown Shutdown the PVC Figure 8-12: Circuit status - 164 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 2 Troubleshooting Traffic Figure 8-13: Troubleshooting Traffic 2.1 Troubleshooting Traffic Problems › Basic problems like port down or binding misconfiguration are quite easy to spot › Much more often problems are more selective and can not be solved using basic checks › SSR offers broad range of counters which are very helpful when troubleshooting › Statistics are collected: – – For ports and For separate circuits within a port – Traffic cards collect statistics for Layers 1,2 and 3 Figure 8-14: Troubleshooting traffic problems (counters) LZT1381712 R1A © Ericsson AB 2015 - 165 - Ericsson SSR 8000 R15 System Troubleshooting 2.2 Port Counters – Overview Now that we have seen port and interface states, let us talk about circuit state. Usually Circuit States follow the state of their parent port. However some circuits can be down even when parent port is up - for example a PPP circuit may be brought down on failure to receive keep-alives whereas its parent port may be up. It is also possible to administratively shut down a circuit while keeping the Port up by using the ‘shutdown’ command under the PVC configuration. For example individual Dot1q Circuits may be administratively shut down on a port. This allows us to shut down one vlan without shutting down the whole port. Using the Show Circuit command we can look at the dot1q Circuits and see their individual states. [local]Train-1# show port count 3/1 Port Type 3/1 ethernet packets sent : 16 packets recvd : 9 send packet rate : 0.13 recv packet rate : 0.63 rate refresh interval : 60 seconds bytes sent bytes recvd send bit rate recv bit rate Data displayed in packets : 680 : 40 : 5.33 : 3.95 Send and receive rate is always refreshed every 60 second Data displayed in bytes Counters Refresh interval: › Port counters are refreshed every few minutes (except rate) › How to get more up to date counters? [local]Train-1# show port count 3/1 ? : ces show ces information detail show detailed information live show live information queue show per-queue information | Output Modifiers <cr> [local]Train-1# Figure 8-15: Port counters – overview - 166 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 2.3 Live Port Counters For more up to date counters we can use the “live” keyword at the end of “show port counters” command. The show port counters live, forces real time collection of counters. The output looks the same but is more up-to-date.However the updates are displayed only when the command is executed. It is also important to note that even if we use the “show port counter live” command, the send and receive packet rate is still only calculated every 60 seconds. › Keyword “live” forces real-time collection of counters from traffic card – Except rate counters › It does not change output [local]Train-1# show port count 3/1 live Port Type 3/1 ethernet packets sent : 1616 bytes sent packets recvd : 8909 bytes recvd send packet rate : 0.13 send bit rate recv packet rate : 0.63 recv bit rate rate refresh interval : 60 seconds : 68680 : 534540 : 45.33 : 303.95 Send and receive rate is always refreshed every 60 second [local]Train-1# Figure 8-16: Live port counters 2.4 Port Counters [local]Train-1# show port count 3/1 detail Counters for port ethernet 3/1 - Interval: 04:03:41 NPU Port Counters -- cut -- Regular port counters NPU Input Error Counters -- cut -- PPA input and output errors NPU Output Error Counters -- cut -Packet Drop Counters -- cut -- PPA counters Independent of layer 2&1 IP packet errors General Counters -- cut – Transmit Counters -- cut – Receive Counters -- cut – L2 port statistics L2/L1 counters Depending on traffic card type Figure 8-17: Port counters – details (1-4) LZT1381712 R1A © Ericsson AB 2015 - 167 - Ericsson SSR 8000 R15 System Troubleshooting [local]Train-1# show port count 3/1 detail Counters for port ethernet 3/1 - Interval: 04:03:41 NPU Port Counters packets sent : 0 packets recvd : 0 send packet rate : 0.00 recv packet rate : 0.00 IP mcast pkts rcv : 0 IP mcast pkts sent : 0 rate refresh interval : 60 seconds bytes sent : 0 bytes recvd : 0 send bit rate : 0.00 recv bit rate : 0.00 IP mcast bytes rcv : 0 IP mcast bytes snt : 0 NPU Input Error Counters idc other errors : 0 idc overrun errors : 0 no cct packets : 0 cct down pkts : 0 unknown encap pkts : 0 unreach pkts : 0 media filter pkts : 0 crc port errors : 0 idc abort errors : 0 no cct bytes : 0 cct down bytes : 0 unknown encap byte : 0 unreach bytes : 0 media filter bytes : 0 NPU Output Error Counters WRED drop pkts : 0 adj drop pkts : 0 tail drop pkts adj drop bytes : 0 : 0 bad IP checksum link layer bcast : 0 : 0 Packet Drop Counters not IPv4 drop pkts : 0 unhandled IP optns : 0 bad IP length : 0 Idc = input descriptor cache (between fpga and ppa) Cct = circuit Encap = encapsulation Unreach = unreachable (unknown destination) mcast = multicast Example packets increasing given counter Received VLAN tag not configured for port VLAN/PPPoE received on plain Ethernet port No route for packet’s destination address Destination MAC address doesn’t match system’s MAC QoS contract exceeded 2000B IP packet with DF flag set needs to be forwarded over FE port (MTU = 1500B) ---(more)--- Figure 8-18: Port counters – details (2-4) General Counters packets sent bytes sent mcast pkts sent bcast pkts sent dropped pkts out pending pkts out port drops out : 0 : 0 : 0 : 0 : 0 : 0 : 0 packets recvd bytes recvd mcast pkts recvd bcast pkts recvd dropped pkts in pending pkts in port drops in Transmit Counters underflow late collision regular collision single collision multiple colls excessive colls deferred error pkts sent error bytes sent : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 eth 64 octets : 0 eth 65-127 octs : 0 eth 128-255 octs : 0 eth 256-511 octs : 0 eth 512-1023 octs : 0 eth 1024-1518 octs : 0 eth > 1518 octs : 0 flow control : 0 Receive Counters jabber false carrier runt frames undersized frames oversized frames crc errors alignment errors symbol errors error pkts rcvd error bytes rcvd : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 eth 64 octets : 0 eth 65-127 octs : 0 eth 128-255 octs : 0 eth 256-511 octs : 0 eth 512-1023 octs : 0 eth 1024-1518 octs : 0 eth > 1518 octs : 0 flow control : 0 overflows : 0 overflow bytes : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 L2 (Ethernet in this case) counters These packets were dropped before reaching IPPA Ethernet transmit and receive counters in details. Very useful for troubleshooting L2 problems Figure 8-19: Port counters (3-4) - 168 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 2.5 Troubleshooting Circuits › It does not happen very often that whole port is being used for one stream of traffic › Usually ports are being divided into sub channels (ATM PVC, Ethernet VLAN, PPPoE sessions) – These channels are referred as circuits in SE architecture › Port level counters give an indication for overall port problems but they are not helpful when you need to troubleshoot one of multiple circuits carrying traffic over the same port › Luckily SSR provides similar counters for circuits as it does for ports Figure 8-20: Troubleshooting circuits 2.6 Circuit Counters Packet Counters values may also be obtained for circuits. In this example we will look at dot1q Ethernet Circuits. We can also retrieve counters values only for a specific circuit. › Circuit counters can be retrieved at various levels [local]Train-1# show circuit counters ? agent-circuit-id Search for circuit based on agent-circuit-id attribute agent-remote-id Search for circuit based on agent-remote-id attribute circuit-group Display info for circuit-group circuits clips Display info for CLIPS circuits detail Display detailed counters dot1q Display info for dot1q circuits ether Display info for ethernet circuits gre Display info for GRE tunnel circuits ipip Display info for IP-in-IP tunnel circuits ipsec Display info for IPSec tunnel circuits ipv6-auto Display info for IPv6-over-v4 auto tunnel circuits ipv6-man Display info for IPv6-over-v4 manual tunnel circuits l2tp Display info for L2TP LNS circuits l2vpn-cross-connect Display info for l2vpn cross connect circuits lg Link-group of circuit(s) live Display live counters mp Display subscriber MP pseudo circuit information mpls Display info for MPLS LSP circuits persistent Persistent counters - values do not reflect clear operations port-pseudowire Display info for port pseudowire circuits ppp Display info for ppp circuits pppoe Display info for pppoe circuits queue Display per-queue counters --cut Figure 8-21: Circuit counters As for a port, we can use the keyword “detail” to get more detailed information about circuit counters. LZT1381712 R1A © Ericsson AB 2015 - 169 - Ericsson SSR 8000 R15 System Troubleshooting 2.7 VLAN Circuit Statistics [local]Train-1# show circuit counters 3/7 vlan-id 10 detail please wait... Circuit: 3/7 vlan-id 10, Internal id: 1/2/6, Encap: ether-dot1q Packets Bytes ------------------------------------------------------------------------------Receive : 2550 Receive : 140022 Receive/Second : 0.50 Receive/Second : 27.00 Transmit : 45 Transmit : 5309 Xmits/Queue Xmits/Queue 0 : 45 0 : 5309 1 : 0 1 : 0 2 : 0 2 : 0 3 : 0 3 : 0 4 : 0 4 : 0 5 : 0 5 : 0 6 : 0 6 : 0 7 : 0 7 : 0 8 : 0 8 : 0 Xmit Q Deleted : 0 Xmit Q Deleted : 0 Transmit/Second : 0.03 Transmit/Second : 1.54 IP Multicast Rcv: 0 IP Multicast Rcv: 0 IP Multicast Tx : 0 IP Multicast Tx : 0 Unknown Encaps : 0 Unknown Encaps : 0 Down Drops : 0 Down Drops : 0 Unreach Drops : 0 Unreach Drops : 0 Adj Drops : 0 Adj Drops : 0 WRED Drops Total: 0 WRED Drops Total: 0 WRED Drops/Queue WRED Drops/Queue 0 : 0 0 : 0 0 : 0 0 : 0 1 : 0 1 : 0 2 : 0 2 : 0 6 : 0 6 : 0 7 : 0 7 : 0 ---more--- Transmitted packets per QoS defined queue The same meaning as for port QoS WRED drops per queue Figure 8-22: VLAN circuit statistics (1-2) Tail Drops Total: Tail Drops/Queue 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 0 0 0 0 0 0 0 0 0 Tail Drops Total: Tail Drops/Queue 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 0 0 0 0 0 0 0 0 0 0 0 IP Counters Soft GRE MPLS : Not IPv4 drops : Unhandled IP Opt: Bad IP Length : Bad IP Checksum : Not IPv6 drops : Broadcast Drops : 0 0 0 0 0 0 0 Soft GRE MPLS Not IPv4 drops Not IPv6 drops : 0 MPLS Counters MPLS Drops : 0 MPLS Drops : 0 ARP Counters Drops : Unreachable : Rate Refresh Interval : 60 seconds 0 0 Drops Unreachable : : 0 0 : : QoS tail drops per queue The same meaning as for port ARP statistics [local]Train-1# Figure 8-23: VLAN circuit statistics (2-2) - 170 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 2.8 Clearing Counters › › There are multiple ways of clearing counters Global one – clears all counters across chassis [local]Train-1# clear port counters › More specific command to clear only specific port [local]Train-1# clear port counters 3/1 › The most specific command allows you for clearing single circuit counters [local]Train-1# clear circuit counters 3/7 vlan-id 20 [local]Train-1# clear circuit counters 3/7 vlan-id 20 pppoe 15 › You can also clear counters for all circuits of certain type [local]Train-1# clear circuit counters pppoe [local]Train-1# clear circuit counters dot1q Figure 8-24: Clearing counters Port and circuit counters can be cleared whenever you like by using the commands “clear circuit counters” and “clear port counters”. This may be useful at the start of a particular event that we need to troubleshoot. These commands can also be applied only to a specific port or circuit. For example if we want to clear the counters only for vlan 100 on port 3/1 we would run the following command. Likewise if we want to clear the counters only for port 3/7 we would run the following command. 2.9 IP Troubleshooting Tool › Probably the most commonly used tool in IP connectivity troubleshooting is ICMP ping ping 1.1.1.1 › [local]Train-1# It has multiple very? useful options 1..2147483647 Enter number of PING to transmit df flood maxs mins numeric pattern preload quiet record silent size source timeout tos ttl verbose Set the Don't Fragment bit in the IP header Flood ping Sweep max size Sweep min size Numeric output only Specify a pattern to fill in ICMP packet Ping sends that many packets as fast as possible Do not display ICMP error messages Includes the RECORD_ROUTE option in the ECHO_REQUEST packet Display only summary lines at startup and finish Size of the ICMP datagram to send Source IP address Specify PING timeout Specify type of service Time-to-live Verbose output Figure 8-25: Ping - key IP troubleshooting tool LZT1381712 R1A © Ericsson AB 2015 - 171 - Ericsson SSR 8000 R15 System Troubleshooting 2.10 Traffic Troubleshooting Exercise: Introduction › There are multiple interconnected contexts on your SSR, all of them are having some connectivity problems › Your job is to find out problems and fix them (if possible) › While there are several ways to find out the root cause, the best would be if you used your knowledge about port/circuit counters – Just show port / circuit / ip commands and ping › Often you have access to only one side of the link – in such case counters are your only option Figure 8-26: Traffic troubleshooting exercise: Introduction 2.11 Traffic Troubleshooting Exercise: Preparation › You need to load new configuration on your SSR before we can start › Please don’t look into the content of the file, also don’t check your SSR configuration after it has been loaded › Looking into config will remove whole fun from exercises [local]Train-1# configure scp://student@10.1.1.3/troubleshooting_1.cfg student@10.1.1.3's password: troubleshooting_1.cf 100% |*****************************| 1896 00:00 Configuration complete % Configuration file processing took: 2 seconds [local]Train-1# Figure 8-27: Traffic troubleshooting exercise: Preparation 2.12 Exercise 7: Traffic Troubleshooting › Please move to the exercises book. Figure 8-28: Exercise 7: Traffic troubleshooting - 172 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 2.12.1 Context Topology for Traffic Troubleshooting Exercise [x1]Train-1# ping 2.2.2.2 df size 1400 silent Contexts: a1, b1 and d1 a1 b1 interface interface e0 e0 ip address ip address 1.1.1.1/30 1.1.1.1/30 d1 3/7 Contexts: a2, b2 and d2 a2 b2 interface e0 ip address 1.1.1.2/30 interface lo ip address 2.2.2.2/32 d2 3/8 interface e0 ip address 1.1.1.1/30 c1 5/4 3/9 e1 3/10 Internet c2 interface e0 ip address 1.1.1.2/30 interface lo ip address 2.2.2.2/32 interface e0 ip address 1.1.1.2/30 interface lo ip address 2.2.2.2/32 Figure 8-29: Context topology for traffic troubleshooting exercise 2.12.2 Traffic Troubleshooting Exercise Review › Contexts a1 & a2 [a1]Train-1# show port count 3/7 detail NPU Port Counters no cct packets : 0 cct down pkts : 0 unknown encap pkts : 3 Unknown encapsulation received from context a2 [a1]Train-1# show port count 3/8 detail NPU Input Error Counters idc other errors : 0 idc overrun errors : 0 no cct packets : 3 No circuit for packets received from context a1 [a1]Train-1# show ip interface brief Mon Dec 8 22:22:07 2008 Name Address e0 1.1.1.1/30 MTU 1500 State Up Bindings dot1q 3/7 vlan-id 10 [a2]Train-1# show ip interface brief Mon Dec 8 22:22:16 2008 Name Address e0 1.1.1.2/30 MTU 1500 State Up Bindings ethernet 3/8 Encapsulation mismatch Figure 8-30: Traffic troubleshooting exercises review (1-7) LZT1381712 R1A © Ericsson AB 2015 - 173 - Ericsson SSR 8000 R15 System Troubleshooting › Contexts b1 & b2 [b1]Train-1# show port count 3/8 detail NPU Input Error Counters idc other errors : 0 unknown encap pkts : 0 unreach pkts : 5 media filter pkts : 0 Context b2 doesn’t know where to send packets from b1 [b2]Train-1# show ip route Type Network Next Hop Dist Metric UpTime 0 0 00:42:29 > C 1.1.1.0/30 [b2]Train-1# Interface e0 No route for 2.2.2.2 Figure 8-31: Traffic troubleshooting exercises review (2-7) › Contexts c1 & c2 [c1]Train-1# show port counters 3/9 detail NPU Output Error Counters WRED drop pkts : 0 adj drop pkts : 5 Packets violate outgoing circuit characteristics [c1]Train-1# show port 3/9 detail ethernet 3/2 state is Up Description : Port circuit : 3/2:511:63:31/1/0/9 Link state : Up Last link state change : Apr 22 01:13:38.381 Line state : Up Admin state : Up Link Dampening : disabled Undampened line state : Up Dampening Count : 0 Encapsulation : dot1q MTU size : 1000 Bytes NAS-Port-Type : none NAS-Port-Id : none MAC address : 00:02:3b:04:65:67 Media type : 1000Base-T --- cut --- Port 3/9 can transmit packets with maximum size of 1000B Figure 8-32: Traffic troubleshooting exercises review (optional) (3-7) - 174 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces › Contexts d1 & d2 [d1]Train-1# show port count 3/8 detail NPU Input Error Counters unreach pkts : 0 media filter pkts : 5 PPA received packets with wrong (not its own) destination MAC address [d1]Train-1# show arp-cache Host 1.1.1.1 1.1.1.2 [d1]Train-1# Hardware address 00:30:88:00:61:55 00:55:88:00:33:77 Ttl - Type ARPA ARPA Circuit 3/7 vlan-id 30 3/7 vlan-id 30 d1 has wrong MAC address for 1.1.1.2 [d2]Train-1# show arp-cache Host 1.1.1.1 1.1.1.2 [d2]Train-1# Hardware address 00:30:88:00:61:55 00:30:88:00:33:78 Ttl 3042 - Type ARPA ARPA Circuit 3/8 vlan-id 30 3/8 vlan-id 30 Figure 8-33: Traffic troubleshooting exercises review (optional) (4-7) › Context e1 [e1]Train-1# ping 2.2.2.2 PING 2.2.2.2 (2.2.2.2): source 1.1.1.1, 36 data bytes, timeout is 1 second ..... ----2.2.2.2 PING Statistics---5 packets transmitted, 0 packets received, 100.0% packet loss [e1]Train-1# show port counters 5/4 detail Counters for port ethernet 5/4 - Interval: 13:01:48 NPU Port Counters packets sent : 3 packets recvd : 2 send packet rate : 0.00 recv packet rate : 0.00 IP mcast pkts rcv : 0 IP mcast pkts sent : 0 rate refresh interval : 60 seconds NPU Input Error Counters idc other errors : 0 idc overrun errors : 0 no cct packets : 0 --cut bytes sent : 138 bytes recvd : 120 send bit rate : 0.00 recv bit rate : 0.00 IP mcast bytes rcv : 0 IP mcast bytes snt : 0 Counters look good, no errors observed crc port errors idc abort errors no cct bytes : 0 : 0 : 0 Figure 8-34: Traffic troubleshooting exercises review (optional) (5-7) LZT1381712 R1A © Ericsson AB 2015 - 175 - Ericsson SSR 8000 R15 System Troubleshooting › Context e1 continued [e1]Train-1# show ip route Type Network Next Hop Dist Metric UpTime > S 0.0.0.0/0 1.1.1.2 1 0 > C 1.1.1.0/30 0 0 [e1]Train-1# [e1]Train-1# ping 1.1.1.2 PING 1.1.1.2 (1.1.1.2): source 1.1.1.1, 36 data bytes, timeout is 1 second ..... 00:43:01 00:43:02 ----1.1.1.2 PING Statistics---5 packets transmitted, 0 packets received, 100.0% packet loss [e1]Train-1# [e1]Train-1# show arp-cache Total number of arp entries in cache: 2 Resolved entry : 1 Incomplete entry : 1 Host 1.1.1.1 1.1.1.2 [e1]Train-1# Hardware address 00:30:88:23:26:c2 incomplete Ttl 10 Type ARPA ARPA Interface e0 e0 ARP failed, we need to debug it Circuit 5/4 vlan-id 10 5/4 vlan-id 10 Figure 8-35: Traffic troubleshooting exercises review (optional) (6-7) › Context e1 continued [e1]Train-1# term mon [e1]Train-1# debug arp pktio [e1]Train-1# ping 1.1.1.2 1 Dec 9 00:08:59: [0015]: %ARP-7-PKTIO: Build ether pkt with dot1q encap for 1.1.1.2, vlan id 10, eh 0x41b03024 Dec 9 00:08:59: [0015]: %ARP-7-PKTIO: Dump outgoing packet, pkt length 46 0 16 32 ff ff ff ff ff ff 00 30 88 23 26 c2 81 00 00 0a 08 06 00 01 08 00 06 04 00 01 00 30 88 23 26 c2 01 01 01 01 ff ff ff ff ff ff 01 01 01 02 Dec 9 00:08:59: [0015]: %ARP-7-PKTIO: Received ARP pkt: context_id 0x4008000f, cct_handle 5/4:1023:63/1/2/22 Dec 9 00:08:59: [0015]: %ARP-7-PKTIO: Dump incoming ARP packet, pkt length 60 0 16 32 48 ff ff ff ff ff ff 00 30 88 23 26 c2 81 00 00 0a 08 06 00 01 08 00 06 04 00 01 00 30 88 23 26 c2 01 01 01 01 ff ff ff ff ff ff 01 01 01 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The same source MAC for send and received ARP request SE received its own ARP request – there is a loop on physical level Figure 8-36: Traffic troubleshooting exercises review (optional) (7-7) - 176 - © Ericsson AB 2015 LZT1381712 R1A Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces 3 Chapter Summary After this course the participant should be able to: › Perform Troubleshooting for Traffic Flow through Ports, Circuits and Interfaces › Explain the traffic flow in SSR System › Identify the Connectivity Issue and Troubleshooting Figure 8-37: Chapter Summary LZT1381712 R1A © Ericsson AB 2015 - 177 - Ericsson SSR 8000 R15 System Troubleshooting Intentionally Blank - 178 - © Ericsson AB 2015 LZT1381712 R1A Acronyms and Abbreviations 9 Acronyms and Abbreviations AAA Authentication, Authorization, and Accounting ALSW Alarm Switch boards AS Autonomous system ATM Asynchronous Transfer Mode BGP Border Gateway Protocol BNG Broadband Network Gateway CDN Content Delivery Network CLI Command Line Interface CPU Central Processing Unit CSM Card Slot Module/Connection State Manager DES Data Encryption Standard DRAM Dynamic Random Access Memory FABL Forwarding Abstraction Layer FIB Forwarding Information Base FTP File Transfer Protocol FTP File Transfer Protocol GB GigaByte GE Gigabit Ethernet GREP Global Regular Expression Parser LZT1381712 R1A © Ericsson AB 2015 - 179 - Ericsson SSR 8000 R15 System Troubleshooting HW Hardware IP Internet Protocol IPOS IP Operating System IPOS Internet Protocol Operating System IPsec Internet Protocol Security IS In Service ISIS Intermediate System - Intermediate System ISM Interface and Circuit State Manager LED Light Emitting Diode MB MegaByte MSE Multi-Service Edge NAT Network Address Translation NPU Network Processing Unit OAM Operations Administration and Maintenance OOS Out of Service OS Operating System OSD Out of Service Diagnostics OSPF Open Shortest Path First PEM Power Entry Module PFE Packet Forwarding Engine PIM Protocol Independent Multicast PM Process Manager POD Power On Diagnostics POF Points of Failure POST Power-On Self Test - 180 - © Ericsson AB 2015 LZT1381712 R1A Acronyms and Abbreviations PVC Permanent Virtual Circuit RAN Remote Access Network RCM Router Config Module RCP Remote Copy Protocol RDB Reliable dataBase RIB Routing information dataBase ROM Read Only Memory RP Route Processor RPSW Route Processor Switch RSVP Resource Reservation Protocol SCP Secure Copy Protocol SFTP Secured File Transfer Protocol SSC Smart Services Card SSH Secure Shell SSR Smart Service Router SW Switch TCP Transmission Control Protocol TFTP Trivial File Transfer Protocol USB Universal Serial Bus VPN Virtual Private Network VTY Virtual Terminal LZT1381712 R1A © Ericsson AB 2015 - 181 - Ericsson SSR 8000 R15 System Troubleshooting Intentionally Blank - 182 - © Ericsson AB 2015 LZT1381712 R1A Index 10 Index Alarm Switch boards, 5, 60, 66, 81, 181 Asynchronous Transfer Mode, 15, 181 Authentication, Authorization, and Accounting, 14, 181 Autonomous system, 181 Border Gateway Protocol, 14, 181 Broadband Network Gateway, 181 Card Slot Module/Connection State Manager, 8, 75, 79, 80, 82, 129, 181 Central Processing Unit, 5, 78, 80, 144, 181 Command Line Interface, 3, 11, 14, 18, 20, 21, 22, 23, 25, 26, 27, 40, 41, 75, 76, 132, 181 Content Delivery Network, 181 Data Encryption Standard, 181 Dynamic Random Access Memory, 181 File Transfer Protocol, 181 Forwarding Abstraction Layer, 181 Forwarding Information Base, 181 Gigabit Ethernet, 181 GigaByte, 69, 70, 181 Global Regular Expression Parser, 4, 40, 42, 43, 46, 181 Hardware, 182 In Service, 14, 182 Interface and Circuit State Manager, 5, 8, 75, 76, 79, 80, 82, 83, 129, 182 Intermediate System - Intermediate System, 79, 182 Internet Protocol, 9, 14, 15, 16, 17, 30, 33, 34, 35, 75, 139, 173, 182 Internet Protocol Operating System, 59, 145, 182 Internet Protocol Security, 182 IP Operating System, 59, 145, 182 Light Emitting Diode, 5, 66, 182 MegaByte, 182 Multi-Service Edge, 182 Network Address Translation, 182 Network Processing Unit, 182 LZT1381712 R1A Open Shortest Path First, 6, 14, 74, 84, 87, 146, 147, 182 Operating System, 16, 75, 138, 139, 182 Operations Administration and Maintenance, 182 Out of Service, 182 Out of Service Diagnostics, 182 Packet Forwarding Engine, 182 Permanent Virtual Circuit, 15, 33, 166, 168, 183 Points of Failure, 182 Power Entry Module, 182 Power On Diagnostics, 182 Power-On Self Test, 182 Process Manager, 7, 74, 76, 106, 128, 182 Protocol Independent Multicast, 182 Read Only Memory, 7, 110, 111, 183 Reliable dataBase, 76, 79, 80, 82, 183 Remote Access Network, 183 Remote Copy Protocol, 183 Resource Reservation Protocol, 183 Route Processor, 6, 59, 60, 90, 91, 96, 97, 100, 101, 102, 103, 104, 105, 106, 107, 119, 120, 121, 122, 132, 134, 183 Route Processor Switch, 5, 58, 59, 69, 75, 77, 79, 119, 128, 183 Router Config Module, 76, 79, 80, 82, 128, 183 Routing information dataBase, 183 Secure Copy Protocol, 183 Secure Shell, 18, 30, 139, 152, 183 Secured File Transfer Protocol, 183 Smart Service Router, 4, 6, 7, 8, 18, 38, 39, 53, 55, 62, 64, 65, 66, 69, 74, 79, 95, 96, 117, 118, 119, 121, 124, 138, 139, 143, 144, 145, 146, 150, 154, 159, 183 Smart Services Card, 183 Switch, 183 Transmission Control Protocol, 183 Trivial File Transfer Protocol, 183 © Ericsson AB 2015 - 183 - Ericsson SSR 8000 R15 System Troubleshooting Universal Serial Bus, 70, 71, 183 Virtual Terminal, 183 - 184 - Virtual Private Network, 14, 183 © Ericsson AB 2015 LZT1381712 R1A Table of Figures 11 Table of Figures Figure 1-1: Chapter Objectives ..................................................................................................... 11 Figure 1-2: Review Fundamental Concepts .................................................................................. 12 Figure 1-3: Context, Interfaces, & Bindings Architecture ............................................................... 13 Figure 1-4: Terminology ................................................................................................................ 16 Figure 1-5: Command Line Interface (CLI) Structure ..................................................................... 18 Figure 1-6: Introduction ................................................................................................................. 19 Figure 1-7: Factory Default System: Step one............................................................................... 19 Figure 1-8: Maneuvering through the CLI ...................................................................................... 20 Figure 1-9: If you are configuring… ............................................................................................... 22 Figure 1-10: Monitoring with CLI ................................................................................................... 23 Figure 1-11: CLI Introduction and the prompt structure ................................................................. 24 Figure 1-12: Context monitoring .................................................................................................... 25 Figure 1-13: CLI Help .................................................................................................................... 25 Figure 1-14: CLI for the fast people ............................................................................................... 26 Figure 1-15: Lab environment ....................................................................................................... 27 Figure 1-16: Connecting to Ericsson Training labs ........................................................................ 28 Figure 1-17: Configure Management Interface .............................................................................. 29 Figure 1-18: Reference for this module ......................................................................................... 29 Figure 1-19: Configure Management interface .............................................................................. 30 Figure 1-20: Validating the configuration ....................................................................................... 33 Figure 1-21: Binding information ................................................................................................... 34 Figure 1-22: Exercise 1: Management configuration ..................................................................... 35 Figure 1-23: Troubleshooting Preparation Commands & Tools ..................................................... 35 Figure 1-24: Troubleshooting Preparation ..................................................................................... 36 Figure 1-25: Remote terminal session timeout .............................................................................. 36 Figure 1-26: Who is logged into the SSR? .................................................................................... 36 Figure 1-27: What did you type before? ........................................................................................ 37 Figure 1-28: Troubleshooting by searching and limiting the output ................................................ 38 Figure 1-29: Command Line Interface & Emacs ............................................................................ 39 Figure 1-30: Command Line Interface & Emacs ............................................................................ 40 Figure 1-31: GREP, Global Regular Expression Parser ................................................................ 40 Figure 1-32: Extended GREP........................................................................................................ 41 Figure 1-33: Other searching tools ................................................................................................ 41 Figure 1-34: Regular expressions ................................................................................................. 42 Figure 1-35: Regular expressions, examples with GREP .............................................................. 44 Figure 1-36: Regular expressions , examples with GREP ............................................................. 44 Figure 1-37: Aliases and Macros ................................................................................................... 45 Figure 1-38: Introduction to Alias .................................................................................................. 45 Figure 1-39: Introduction to macro ................................................................................................ 46 Figure 1-40: Variables in Macros .................................................................................................. 46 LZT1381712 R1A © Ericsson AB 2015 - 185 - Ericsson SSR 8000 R15 System Troubleshooting Figure 1-41: Exercise 2: Introduction, Searching and Filtering ...................................................... 47 Figure 1-42: Exercise 2: Searching and Filtering ........................................................................... 47 Figure 1-43: Exercise 2, review (1-4) ............................................................................................ 47 Figure 1-44: Exercise 2, review (2-4) ............................................................................................ 48 Figure 1-45: Exercise 2, review (3-4) ............................................................................................ 48 Figure 1-46: Exercise 2, review (4-4) (optional) ............................................................................. 49 Figure 1-47: Chapter Summary ..................................................................................................... 50 Figure 2-1: Chapter Objectives ..................................................................................................... 51 Figure 2-2: Recommended Troubleshooting Procedure ................................................................ 52 Figure 2-3: System Hardware Health ............................................................................................ 53 Figure 2-4: Overview: Hardware Status ........................................................................................ 54 Figure 2-5: More detailed hardware info........................................................................................ 55 Figure 2-6: Retrieving hardware details Line cards ....................................................................... 56 Figure 2-7: RPSW hardware information ....................................................................................... 57 Figure 2-8: ALSW hardware information ....................................................................................... 58 Figure 2-9: Finding hardware alarms (1-2) ................................................................................... 59 Figure 2-10: Finding hardware alarms (2-2) ................................................................................. 59 Figure 2-11: System Hardware Checks......................................................................................... 60 Figure 2-12: System alarms .......................................................................................................... 61 Figure 2-13: System Alarm with Options, Examples...................................................................... 61 Figure 2-14: Example: Initiating Major System Alarm .................................................................... 62 Figure 2-15: Example: Initiating Critical System Alarm .................................................................. 63 Figure 2-16: System Hardware LED.............................................................................................. 64 Figure 2-17: Card Powered Down ................................................................................................. 65 Figure 2-18: System storage Verification ...................................................................................... 66 Figure 2-19: System storage ......................................................................................................... 67 Figure 2-20: System storage verification ....................................................................................... 68 Figure 2-21: System storage verification: Example ....................................................................... 68 Figure 2-22: Chapter Summary ..................................................................................................... 70 Figure 3-1: Chapter Objectives ..................................................................................................... 71 Figure 3-2: Process Architecture ................................................................................................... 72 Figure 3-3: RPSW Processes (1-3) ............................................................................................... 73 Figure 3-4: RPSW Processes (2-3) ............................................................................................... 75 Figure 3-5: RPSW Processes (process communication) (3-3)....................................................... 75 Figure 3-6: Process Scheduling .................................................................................................... 76 Figure 3-7: RPSW processes verification ...................................................................................... 77 Figure 3-8: Finding CPU intensive processes ............................................................................... 78 Figure 3-9: Single process verification .......................................................................................... 79 Figure 3-10: Single process in detail ............................................................................................. 80 Figure 3-11: Single Process Verification – ISM ............................................................................. 81 Figure 3-12: Single Process Verification – OSPF .......................................................................... 82 Figure 3-13: Maximum Crashes Allowed ....................................................................................... 83 Figure 3-14: Process crash (1-2)................................................................................................... 84 Figure 3-15: What happens when a process crashes? .................................................................. 84 Figure 3-16: Software Process Failure Scenario ........................................................................... 85 Figure 3-17: Process crash (2-2)................................................................................................... 86 Figure 3-18: System Stopped Processes ...................................................................................... 86 Figure 3-19: Did a process crash? (1-2) ........................................................................................ 87 Figure 3-20: Did a process crash? (2-2) ........................................................................................ 87 Figure 3-21: Old core files on RP – BAD IDEA.............................................................................. 88 Figure 3-22: Core files are copied between RPs ........................................................................... 88 Figure 3-23: Core dump files on standby RP................................................................................. 89 - 186 - © Ericsson AB 2015 LZT1381712 R1A Table of Figures Figure 3-24: Exercise 3: Introduction............................................................................................. 90 Figure 3-25: Exercise 3: System Processes .................................................................................. 90 Figure 3-26: Exercise 3, review (1-2) ............................................................................................ 90 Figure 3-27: Exercise 3, review (2-2) (Optional parts) ................................................................... 91 Figure 3-28: Chapter Summary ..................................................................................................... 92 Figure 4-1: Chapter Objectives ..................................................................................................... 93 Figure 4-2: RP redundancy ........................................................................................................... 94 Figure 4-3: RP redundancy details ................................................................................................ 95 Figure 4-4: Investigating redundancy issues ................................................................................. 96 Figure 4-5: show system redundancy (1-3) .................................................................................. 96 Figure 4-6: show system redundancy (2-3) ................................................................................... 97 Figure 4-7: show system redundancy (3-3) ................................................................................... 97 Figure 4-8: Analyzing Problems of Standby RP ............................................................................ 98 Figure 4-9: Which RP should you check, Active or Standby? ....................................................... 99 Figure 4-10: Connecting to standby RP without console ............................................................... 99 Figure 4-11: Searching for restart reason .................................................................................... 100 Figure 4-12: Repeating commands on standby RP ..................................................................... 100 Figure 4-13: Repeating commands on standby RP ..................................................................... 100 Figure 4-14: Verify processes on standby RP ............................................................................. 101 Figure 4-15: Copy files from standby RP ..................................................................................... 101 Figure 4-16: Copy files from standby RP ..................................................................................... 102 Figure 4-17: RP Failover Management ....................................................................................... 103 Figure 4-18: Managing Reloads and RP Switch-over .................................................................. 104 Figure 4-19: Manual RP Switchover (1-2) ................................................................................... 104 Figure 4-20: Manual RP switchover (2-2) .................................................................................... 105 Figure 4-21: Chapter Summary ................................................................................................... 106 Figure 5-1: Chapter Objectives ................................................................................................... 107 Figure 5-2: Boot Problems .......................................................................................................... 108 Figure 5-3: Entering Boot ROM Interface .................................................................................... 108 Figure 5-4: Example: Entering Boot ROM Interface .................................................................... 109 Figure 5-5: Diagnostics Command .............................................................................................. 109 Figure 5-6: Running Diagnostics ................................................................................................. 110 Figure 5-7: Troubleshooting Scenarios ....................................................................................... 111 Figure 5-8: Resume Boot ............................................................................................................ 111 Figure 5-9: Troubleshooting Scenarios ....................................................................................... 112 Figure 5-10: System uptime ........................................................................................................ 112 Figure 5-11: Check for human errors .......................................................................................... 113 Figure 5-12: System storage verification ..................................................................................... 113 Figure 5-13: Exercise 4: Investigate Boot Problems .................................................................... 113 Figure 5-14: Chapter Summary ................................................................................................... 114 Figure 6-1: Chapter Objective ..................................................................................................... 115 Figure 6-2: System logging introduction ...................................................................................... 116 Figure 6-3: Loggd Process .......................................................................................................... 117 Figure 6-4: System log commands.............................................................................................. 118 Figure 6-5: Event Severity Levels in Log Messages .................................................................... 119 Figure 6-6: Logs from cards ........................................................................................................ 120 Figure 6-7: Show log and time .................................................................................................... 120 Figure 6-8: Show log and time .................................................................................................... 121 Figure 6-9: Log Files ................................................................................................................... 122 Figure 6-10: Custom Log fIles and filters..................................................................................... 123 Figure 6-11: Log Files location .................................................................................................... 124 LZT1381712 R1A © Ericsson AB 2015 - 187 - Ericsson SSR 8000 R15 System Troubleshooting Figure 6-12: Display Log Files..................................................................................................... 124 Figure 6-13: Filter Based on Facility ............................................................................................ 125 Figure 6-14: Filter Based on Facility example ............................................................................. 125 Figure 6-15: Pm Process Logs .................................................................................................... 126 Figure 6-16: CSM Process Logs ................................................................................................. 127 Figure 6-17: ISM Process ........................................................................................................... 127 Figure 6-18: Filter based on facility on card................................................................................. 128 Figure 6-19: Logger verification................................................................................................... 129 Figure 6-20: Show Logging Card information .............................................................................. 130 Figure 6-21: Logging display info ................................................................................................ 130 Figure 6-22: Logging debug ........................................................................................................ 132 Figure 6-23: Logging debug (global config logging)..................................................................... 133 Figure 6-24: Logging debug ........................................................................................................ 134 Figure 6-25: Log File Collection .................................................................................................. 134 Figure 6-26: Syslog Configuration ............................................................................................... 135 Figure 6-27: Syslog server .......................................................................................................... 136 Figure 6-28: Reference for Syslog lab ......................................................................................... 136 Figure 6-29: Exercise 5: Logging & Syslog ................................................................................. 136 Figure 6-30: Exercise review: Configure Syslog & Debug ........................................................... 137 Figure 6-31: Exercise review: Syslog server environment ........................................................... 138 Figure 6-32: Exercise review: Save and display the logs............................................................. 138 Figure 6-33: Chapter Summary ................................................................................................... 139 Figure 7-1: Chapter Objectives ................................................................................................... 141 Figure 7-2: Debug introduction .................................................................................................... 142 Figure 7-3: The challenge ........................................................................................................... 143 Figure 7-4: Debug coverage (what) ............................................................................................. 144 Figure 7-5: How to recognize a debug function is context specific? Context ID ........................... 145 Figure 7-6: Debug coverage (where)........................................................................................... 146 Figure 7-7: Debugging within context local .................................................................................. 146 Figure 7-8: Debugging in different contexts ................................................................................. 147 Figure 7-9: Debug relationship with contexts............................................................................... 148 Figure 7-10: Send debug output to screen .................................................................................. 149 Figure 7-11: Administrator privacy .............................................................................................. 151 Figure 7-12: Debugging and “impact” .......................................................................................... 152 Figure 7-13: Exercise 6: Debugging on SSR ............................................................................... 152 Figure 7-14: Chapter Summary ................................................................................................... 153 Figure 8-1: Chapter Objectives ................................................................................................... 155 Figure 8-2: Troubleshooting Basic Checks .................................................................................. 156 Figure 8-3: Interface & Port States .............................................................................................. 157 Figure 8-4: Interface & Port States .............................................................................................. 158 Figure 8-5: Interface & Port States .............................................................................................. 158 Figure 8-6: Interface & Port States .............................................................................................. 159 Figure 8-7: Verifying interface status ........................................................................................... 159 Figure 8-8: Identifying interface problems: Unbound state (1-3) .................................................. 160 Figure 8-9: Identifying interface problems: Bound state (2-3) ...................................................... 161 Figure 8-10: Identifying interface problems: Bound state (cont.) (3-3) ......................................... 162 Figure 8-11: Port status: Admin state and Line State .................................................................. 163 Figure 8-12: Circuit status ........................................................................................................... 164 Figure 8-13: Troubleshooting Traffic ........................................................................................... 165 Figure 8-14: Troubleshooting traffic problems (counters) ............................................................ 165 Figure 8-15: Port counters – overview......................................................................................... 166 Figure 8-16: Live port counters ................................................................................................... 167 - 188 - © Ericsson AB 2015 LZT1381712 R1A Table of Figures Figure 8-17: Port counters – details (1-4) .................................................................................... 167 Figure 8-18: Port counters – details (2-4) .................................................................................... 168 Figure 8-19: Port counters (3-4) .................................................................................................. 168 Figure 8-20: Troubleshooting circuits .......................................................................................... 169 Figure 8-21: Circuit counters ....................................................................................................... 169 Figure 8-22: VLAN circuit statistics (1-2) ..................................................................................... 170 Figure 8-23: VLAN circuit statistics (2-2) ..................................................................................... 170 Figure 8-24: Clearing counters .................................................................................................... 171 Figure 8-25: Ping - key IP troubleshooting tool ............................................................................ 171 Figure 8-26: Traffic troubleshooting exercise: Introduction .......................................................... 172 Figure 8-27: Traffic troubleshooting exercise: Preparation .......................................................... 172 Figure 8-28: Exercise 7: Traffic troubleshooting .......................................................................... 172 Figure 8-29: Context topology for traffic troubleshooting exercise ............................................... 173 Figure 8-30: Traffic troubleshooting exercises review (1-7) ......................................................... 173 Figure 8-31: Traffic troubleshooting exercises review (2-7) ........................................................ 174 Figure 8-32: Traffic troubleshooting exercises review (optional) (3-7) ......................................... 174 Figure 8-33: Traffic troubleshooting exercises review (optional) (4-7) ......................................... 175 Figure 8-34: Traffic troubleshooting exercises review (optional) (5-7) ......................................... 175 Figure 8-35: Traffic troubleshooting exercises review (optional) (6-7) ......................................... 176 Figure 8-36: Traffic troubleshooting exercises review (optional) (7-7) ......................................... 176 Figure 8-37: Chapter Summary ................................................................................................... 177 LZT1381712 R1A © Ericsson AB 2015 - 189 -