Nexus 7000 Series M132XP−12 Module Troubleshooting Based on Error Logs Document ID: 116227 Contributed by Yogesh Ramdoss, Robert Hurst, Vincent Ng, Cisco TAC Engineers. Oct 21, 2013 Contents Introduction Prerequisites Requirements Components Used Background Information Scenario 1: N7K−M132XP−12 Diagnostic "Port Loopback" Test Failed Scenario 2: M1 Modules Get Reset and/or Link Flaps Scenario 3: All M1 Modules Fail Specific Diagnostic Tests, Like the PortLoopback or RewriteEngineLoopback Tests Related Information Introduction This document describes the process that is used in order to determine if a Cisco Nexus 7000 Series (N7K) M132XP−12 or a N7K−M132XP−12L module needs to be sent for Return Material Authorization (RMA). Prerequisites Requirements Cisco recommends that you have knowledge of the Nexus operating system CLI. Components Used The information in this document is based on the N7K M132XP−12 Linecard. The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command. Background Information In the case of suspected hardware failure on the N7K−M132XP−12 module, the cause could be from a software defect where an RMA is not required. This document lists the symptoms experienced, and provides the troubleshooting steps required in order to determine the health of the module. Scenario 1: N7K−M132XP−12 Diagnostic "Port Loopback" Test Failed Symptoms The module experiences diagnostic failure, and this syslog is observed: %DIAG_PORT_LB−2−PORTLOOPBACK_TEST_FAIL: Module:18 Test: PortLoopback failed 10 consecutive times. Faulty module: Module 18 affected ports:23 Error:Loopback test failed. Packets lost on the LC at the Queueing engine ASIC N7k# show diagnostic result module 18 Current bootup diagnostic level: complete Module 18: 10 Gbps Ethernet Module Test results: (. = Pass, F = Fail, I = Incomplete, U = Untested, A = Abort, E = Error disabled) 1) 2) 3) 4) 5) EOBCPortLoopback−−−−−−−−−−−−−−> ASICRegisterCheck−−−−−−−−−−−−−> PrimaryBootROM−−−−−−−−−−−−−−−−> SecondaryBootROM−−−−−−−−−−−−−−> PortLoopback: . E . . Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− U U I I I I I I U U I . I . I . Port 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− U U . . U U E . U U I I I I I I 6) RewriteEngineLoopback: Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− . . . . . . . . . . . . . . . . Port 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− . . . . . . . . . . . . . . . . "show module" N7k# show module Mod Ports Module−Type −−− −−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 16 32 10 Gbps Ethernet Module 17 32 10 Gbps Ethernet Module 18 32 10 Gbps Ethernet Module Mod Sw Hw Model −−−−−−−−−−−−−−−−−− N7K−M132XP−12 N7K−M132XP−12 N7K−M132XP−12 Status −−−−−−−−−−−− ok ok ok −−− 16 17 18 −−−−−−−−−−−−−− 4.2(6E5) 4.2(6E5) 4.2(6E5) −−−−−− 2.0 1.7 1.7 Mod −−− 16 17 18 MAC−Address(es) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− 50−3d−e5−b8−5e−10 to 50−3d−e5−b8−5e−34 88−43−e1−c7−0b−90 to 88−43−e1−c7−0b−b4 88−43−e1−c7−60−c0 to 88−43−e1−c7−60−e4 Mod −−− 16 17 18 Online Diag Status −−−−−−−−−−−−−−−−−− Fail Pass Fail Serial−Num −−−−−−−−−− JAF1504CPAR JAF1405BJLJ JAF1405CLML Checklist This scenario is likely due to Cisco Bug ID CSCtn81109 or CSCti95293. In order to verify that the problem is caused by software defect or by actual hardware failure that requires RMA, complete these steps: 1. Check to see if the NX−OS version matches with the Distributed Defect Tracking System (DDTS) version. Both DDTS are fixed and verified in Version 5.2.4. 2. Enter the show log command when the diagnostic message is observed in order to view the time stamp of the diagnostic test failure. Determine if there are any CPU issues that occurred near the same time. Sometimes when the CPU is overwhelmed, it causes the diagnostic port loopback test to fail. This is a good data point to collect even though it might not be the cause of the problem. 3. Collect additional CLI data with these commands: tac−pac bootflash:tech.txt show tech module 1 show tech gold show hardware internal errors module 1 | diff − issue this a few times 4. Clear the diagnostic result and rerun it while the CPU is not overwhelmed with these commands: # show diagnostic result module 1 # diagnostic clear result module all (config)# no diagnostic monitor module 1 test 5 Note: You might need to check the test number in order to ensure that it is the PortLoopback test. The 5.x base code could be test 5, whereas the 6.0 base code could be test 6. (config)# diagnostic monitor module 1 test 5 # diagnostic start module 1 test 5 # show diagnostic result module 1 test 5 Note: It could take a few minutes before the test is completed. # show module internal exceptionlog module 1 # show module internal event−history errors # show hardware internal errors module 1 If the module is recovered and the diagnostic test passes, it is likely that this is due to the DDTS mentioned above, because actual hardware failure should fail diagnostics consistently. Note: If the module fails the diagnostic test consistently, you might have an actual hardware failure, so contact the Cisco Technical Assistance Center (TAC) for further help. Scenario 2: M1 Modules Get Reset and/or Link Flaps Symptoms N7k %$ VDC−1 %$ %DIAG_PORT_LB−2−PORTLOOPBACK_TEST_FAIL: Module:3 Test:PortLoopback failed 10 consecutive times. Faulty module: affected ports:3,5,7,11,13,15,19,21,23,27,29,31 Error:Loopback test failed. Packets lost on the LC at the MAC ASIC N7k %$ VDC−1 %$ %DIAG_PORT_LB−2−PORTLOOPBACK_TEST_FAIL: Module:3 Test:PortLoopback failed 10 consecutive times. Faulty module: affected ports:4,6,8,12,14,16,20,22,24,26,28,30,32 Error:Loopback test failed. Packets lost on the LC at the Queueing engine ASIC Checklist This problem is likely due to Cisco Bug ID CSCtt43115. It is NOT a hardware failure, and no replacement is required. Collect all the logs reported and the sequence of events that occurred. show tech detail show accounting log show logging Ensure that the configurations, specifically Switched Port Analyzer (SPAN), and symptoms match those mentioned in the DDTS Release Notes enclosure. Note: This issue applies to all M1 module types. Scenario 3: All M1 Modules Fail Specific Diagnostic Tests, Like the PortLoopback or RewriteEngineLoopback Tests Symptoms This issue happens when there is an issue between the Active Supervisor (Sup) engine and the Xbar module, which results in corruption of the diagnostic packet. The N7K switch might report that multiple/all ports in multiple/all modules fail these tests. This issue requires manual investigation and isolation of the faulty Sup engine. The condition that caused the tests to go into the errdisabled state might be transient. Cisco recommends that you run the tests on−demand in order to determine if the condition persists. In order to clear the ErrDisabled state of the test, enter: N7K# diagnostic clear result module 1 test ? <1−6> Test ID(s) all Select all In order to run the on−demand test, enter: N7K# diagnostic start module <mod#> test <test#> In order to stop the test, enter: N7K# diagnostic stop module <mod#> test <test#> As a corrective action, the Sup engine does not trigger failover or reset in order to recover from this condtion. In order to request corrective action, an enhancement request has been filed: Cisco Bug ID CSCth03474 − n7k/GOLD:Improve Fault Isolation of N7K−GOLD. Related Information • FN − 63495 − NX−OS 5.2(1) − Nexus 7000 M1−Series Modules May Reset or Link State Across Multiple Ports May Flap After Configuring a New VLAN with SPAN • SOFTWARE ADVISORY NOTICE • Technical Support & Documentation − Cisco Systems Updated: Oct 21, 2013 Document ID: 116227