Nexus 7000 Series M132XP−12 Module Troubleshooting Based on Error Logs Contents

advertisement
Nexus 7000 Series M132XP−12 Module
Troubleshooting Based on Error Logs
Document ID: 116227
Contributed by Yogesh Ramdoss, Robert Hurst, Vincent Ng, Cisco TAC
Engineers.
Oct 21, 2013
Contents
Introduction
Prerequisites
Requirements
Components Used
Background Information
Scenario 1: N7K−M132XP−12 Diagnostic "Port Loopback" Test Failed
Scenario 2: M1 Modules Get Reset and/or Link Flaps
Scenario 3: All M1 Modules Fail Specific Diagnostic Tests, Like the PortLoopback
or RewriteEngineLoopback Tests
Related Information
Introduction
This document describes the process that is used in order to determine if a Cisco Nexus 7000 Series (N7K)
M132XP−12 or a N7K−M132XP−12L module needs to be sent for Return Material Authorization (RMA).
Prerequisites
Requirements
Cisco recommends that you have knowledge of the Nexus operating system CLI.
Components Used
The information in this document is based on the N7K M132XP−12 Linecard.
The information in this document was created from the devices in a specific lab environment. All of the
devices used in this document started with a cleared (default) configuration. If your network is live, make sure
that you understand the potential impact of any command.
Background Information
In the case of suspected hardware failure on the N7K−M132XP−12 module, the cause could be from a
software defect where an RMA is not required.
This document lists the symptoms experienced, and provides the troubleshooting steps required in order to
determine the health of the module.
Scenario 1: N7K−M132XP−12 Diagnostic "Port Loopback"
Test Failed
Symptoms
The module experiences diagnostic failure, and this syslog is observed:
%DIAG_PORT_LB−2−PORTLOOPBACK_TEST_FAIL: Module:18 Test:
PortLoopback failed 10 consecutive times. Faulty module:
Module 18 affected ports:23 Error:Loopback test failed.
Packets lost on the LC at the Queueing engine ASIC
N7k# show diagnostic result module 18
Current bootup diagnostic level: complete
Module 18: 10 Gbps Ethernet Module
Test results: (. = Pass, F = Fail, I = Incomplete,
U = Untested, A = Abort, E = Error disabled)
1)
2)
3)
4)
5)
EOBCPortLoopback−−−−−−−−−−−−−−>
ASICRegisterCheck−−−−−−−−−−−−−>
PrimaryBootROM−−−−−−−−−−−−−−−−>
SecondaryBootROM−−−−−−−−−−−−−−>
PortLoopback:
.
E
.
.
Port
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
U U I I I I I I U U I . I . I .
Port 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
U U . . U U E . U U I I I I I I
6) RewriteEngineLoopback:
Port
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
. . . . . . . . . . . . . . . .
Port 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
. . . . . . . . . . . . . . . .
"show module"
N7k# show module
Mod Ports Module−Type
−−− −−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
16
32
10 Gbps Ethernet Module
17
32
10 Gbps Ethernet Module
18
32
10 Gbps Ethernet Module
Mod
Sw
Hw
Model
−−−−−−−−−−−−−−−−−−
N7K−M132XP−12
N7K−M132XP−12
N7K−M132XP−12
Status
−−−−−−−−−−−−
ok
ok
ok
−−−
16
17
18
−−−−−−−−−−−−−−
4.2(6E5)
4.2(6E5)
4.2(6E5)
−−−−−−
2.0
1.7
1.7
Mod
−−−
16
17
18
MAC−Address(es)
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
50−3d−e5−b8−5e−10 to 50−3d−e5−b8−5e−34
88−43−e1−c7−0b−90 to 88−43−e1−c7−0b−b4
88−43−e1−c7−60−c0 to 88−43−e1−c7−60−e4
Mod
−−−
16
17
18
Online Diag Status
−−−−−−−−−−−−−−−−−−
Fail
Pass
Fail
Serial−Num
−−−−−−−−−−
JAF1504CPAR
JAF1405BJLJ
JAF1405CLML
Checklist
This scenario is likely due to Cisco Bug ID CSCtn81109 or CSCti95293.
In order to verify that the problem is caused by software defect or by actual hardware failure that requires
RMA, complete these steps:
1. Check to see if the NX−OS version matches with the Distributed Defect Tracking System (DDTS)
version. Both DDTS are fixed and verified in Version 5.2.4.
2. Enter the show log command when the diagnostic message is observed in order to view the time
stamp of the diagnostic test failure. Determine if there are any CPU issues that occurred near the same
time. Sometimes when the CPU is overwhelmed, it causes the diagnostic port loopback test to fail.
This is a good data point to collect even though it might not be the cause of the problem.
3. Collect additional CLI data with these commands:
tac−pac bootflash:tech.txt
show tech module 1
show tech gold
show hardware internal errors module 1 | diff − issue this a few times
4. Clear the diagnostic result and rerun it while the CPU is not overwhelmed with these commands:
# show diagnostic result module 1
# diagnostic clear result module all
(config)# no diagnostic monitor module 1 test 5
Note: You might need to check the test number in order to ensure that it is the PortLoopback test. The
5.x base code could be test 5, whereas the 6.0 base code could be test 6.
(config)# diagnostic monitor module 1 test 5
# diagnostic start module 1 test 5
# show diagnostic result module 1 test 5
Note: It could take a few minutes before the test is completed.
# show module internal exceptionlog module 1
# show module internal event−history errors
# show hardware internal errors module 1
If the module is recovered and the diagnostic test passes, it is likely that this is due to the DDTS
mentioned above, because actual hardware failure should fail diagnostics consistently.
Note: If the module fails the diagnostic test consistently, you might have an actual hardware failure,
so contact the Cisco Technical Assistance Center (TAC) for further help.
Scenario 2: M1 Modules Get Reset and/or Link Flaps
Symptoms
N7k %$ VDC−1 %$ %DIAG_PORT_LB−2−PORTLOOPBACK_TEST_FAIL: Module:3
Test:PortLoopback failed 10 consecutive times. Faulty module:
affected ports:3,5,7,11,13,15,19,21,23,27,29,31 Error:Loopback test failed.
Packets lost on the LC at the MAC ASIC
N7k %$ VDC−1 %$ %DIAG_PORT_LB−2−PORTLOOPBACK_TEST_FAIL: Module:3
Test:PortLoopback failed 10 consecutive times. Faulty module:
affected ports:4,6,8,12,14,16,20,22,24,26,28,30,32 Error:Loopback test failed.
Packets lost on the LC at the Queueing engine ASIC
Checklist
This problem is likely due to Cisco Bug ID CSCtt43115. It is NOT a hardware failure, and no replacement is
required.
Collect all the logs reported and the sequence of events that occurred.
show tech detail
show accounting log
show logging
Ensure that the configurations, specifically Switched Port Analyzer (SPAN), and symptoms match those
mentioned in the DDTS Release Notes enclosure.
Note: This issue applies to all M1 module types.
Scenario 3: All M1 Modules Fail Specific Diagnostic Tests,
Like the PortLoopback or RewriteEngineLoopback Tests
Symptoms
This issue happens when there is an issue between the Active Supervisor (Sup) engine and the Xbar module,
which results in corruption of the diagnostic packet. The N7K switch might report that multiple/all ports in
multiple/all modules fail these tests.
This issue requires manual investigation and isolation of the faulty Sup engine.
The condition that caused the tests to go into the errdisabled state might be transient. Cisco recommends that
you run the tests on−demand in order to determine if the condition persists.
In order to clear the ErrDisabled state of the test, enter:
N7K# diagnostic clear result module 1 test ?
<1−6> Test ID(s)
all
Select all
In order to run the on−demand test, enter:
N7K# diagnostic start module <mod#> test <test#>
In order to stop the test, enter:
N7K# diagnostic stop module <mod#> test <test#>
As a corrective action, the Sup engine does not trigger failover or reset in order to recover from this condtion.
In order to request corrective action, an enhancement request has been filed: Cisco Bug ID CSCth03474 −
n7k/GOLD:Improve Fault Isolation of N7K−GOLD.
Related Information
• FN − 63495 − NX−OS 5.2(1) − Nexus 7000 M1−Series Modules May Reset or Link State Across
Multiple Ports May Flap After Configuring a New VLAN with SPAN
• SOFTWARE ADVISORY NOTICE
• Technical Support & Documentation − Cisco Systems
Updated: Oct 21, 2013
Document ID: 116227
Download