Uploaded by Иван Иванов

3PAR Fatal Error Codes

advertisement
HPE 3PAR BIOS Error Codes Reference
Guide
Abstract
This Hewlett Packard Enterprise (HPE) guide provides authorized service technicians
information about the HPE 3PAR BIOS error codes. The error code information is provided in
separate chapters for either HPE 3PAR OS 3.3.1 or HPE 3PAR OS 3.2.2. This document is
for Hewlett Packard Enterprise INTERNAL USE ONLY.
Part Number: QL226-99685
Published: August 2017
©
Copyright 2017 Hewlett Packard Enterprise Development LP
Notices
The information contained herein is subject to change without notice. The only warranties for Hewlett
Packard Enterprise products and services are set forth in the express warranty statements accompanying
such products and services. Nothing herein should be construed as constituting an additional warranty.
Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained
herein.
Confidential computer software. Valid license from Hewlett Packard Enterprise required for possession,
use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer
Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government
under vendor's standard commercial license.
Links to third-party websites take you outside the Hewlett Packard Enterprise website. Hewlett Packard
Enterprise has no control over and is not responsible for information outside the Hewlett Packard
Enterprise website.
Acknowledgments
Intel®, Itanium®, Pentium®, Intel Inside®, and the Intel Inside logo are trademarks of Intel Corporation in
the United States and other countries.
Microsoft® and Windows® are either registered trademarks or trademarks of Microsoft Corporation in the
United States and/or other countries.
Adobe® and Acrobat® are trademarks of Adobe Systems Incorporated.
Java® and Oracle® are registered trademarks of Oracle and/or its affiliates.
UNIX® is a registered trademark of The Open Group.
Contents
Error codes—HPE 3PAR OS 3.3.1......................................................... 4
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1.............................................. 4
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1...........................173
Error codes—HPE 3PAR OS 3.2.2..................................................... 178
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2.......................................... 178
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2...........................338
Websites.............................................................................................. 343
Support and other resources.............................................................344
Accessing Hewlett Packard Enterprise Support....................................................................... 344
Accessing updates....................................................................................................................344
Customer self repair..................................................................................................................345
Remote support........................................................................................................................ 345
Warranty information.................................................................................................................345
Regulatory information..............................................................................................................346
Documentation feedback.......................................................................................................... 346
Contents
3
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS
3.3.1
When a BIOS initialization or diagnostic test fails such that the node cannot be allowed to continue
booting, a fatal error message is often displayed (sometimes with additional information). For each class
of error, a major Code is provided. A class-specific sub-code is also provided which gives the specific
failure condition. For example:
PROM checksum: FAIL
*** Fatal error: Code 25, sub-code 0x0 (0)
The fatal error shown represents a PROM checksum failure. The Serial EEPROM on the board has a bad
checksum. Either it has not been initialized or it is corrupted.
When the BIOS reaches a Fatal error, it immediately stops all hardware initialization, testing, and booting,
and then logs the error to the Serial EEPROM (PROM).
Whack (the CBIOS command line) is available by pressing ^W (Control and W keys simultaneously),
which you can use to diagnose and possibly correct the problem. You can use the Whack command prom
log to review previously recorded fatal system errors.
If you believe the Fatal error does not impact your immediate test and would like to try to resume, press
^C (not recommended). The fatal error routine returns to the point where the error was detected, possibly
drawing more fatal errors or worse depending on the type of error. Instead, To ensure safe system
operation, Hewlett Packard Enterprise recommends resolving the problem before resuming.
NOTE:
A "GEvent" or "GPE" is a "GPIO (General Purpose I/O) event" or "general purpose event".
4
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 0, subcode 0x0 (0)
Description
INITIALIZATION_OK"No Error"
This is actually not a node hardware or software
initialization
or test failure. This code should never occur, and suggests
corruption of the PROM log if it is seen.
Resolution: Contact 3PAR technical support.
Fatal error: Code 1, subcode 0x1 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Unknown CPUID string: `xxxx'
Bad or unknown CPU ID (non-Intel).The BIOS is unable to
fully identify the processor. This sub-code indicates the
CPUID string is not "GenuineIntel".
Resolution: A) Replace the processor.
B) Try moving the processor to the other CPU socket.
It could be a single socket problem.
C) Try moving the processor to another system.
It could be node hardware or software.
D) Replace the node motherboard.
Diagnostic: A) Use Whack "cpu id" command. The interesting
line will follow a line similar to:
Intel Pentium III Processor:
or
Intel Pentium 4 Xeon Processor:
Fatal error: Code 1, subcode 0x2 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Required features 0x008053fb are missing.
Each class of CPU has a list of technology features it
supports.
If this error occurs, it is because the CPU is either
severely
downrev, the CPU is bad, or the motherboard is bad.
Resolution: A) Replace the processor.
B) Try moving the processor to the other CPU socket.
It could be a single socket problem.
C) Try moving the processor to another system.
It could be node hardware or software.
Diagnostic: A) Use Whack "cpu id" command. The interesting
line will be similar to:
Family 6 ... Features 0x0387fbff, Pflags 4
Table Continued
Error codes—HPE 3PAR OS 3.3.1
5
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 1, subcode 0x3 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: 3PAR has not certified this CPU.
Each run of CPU has a major revision and a minor stepping
number.
If you receive this message, the processor has not yet been
verified by 3PAR for reliable operation. If this is a new
processor, it may be acceptable to press ^C to resume after
this
error. If you are testing a new stepping of the processor
and
need to use it, use the following Whack command to ignore an
unknown CPUID:
Whack> set perm cpu_unqual_ok
Resolution: A) Upgrade to the latest CBIOS to ensure newer
certified processors are acceptable.
B) Replace the processor with one certified
by 3PAR for use with the board.
Diagnostic: A) Use Whack "cpu id" command. The interesting
line will be similar to:
Family 6, Model 8, Stepping 3, Features ...
Fatal error: Code 1, subcode 0x4 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: 3PAR has not certified the bootstrap CPU as a
dual processor.
If more than one processor is installed, both CPUs must be
certified to operate in multiprocessor mode. This error
indicates that the bootstrap processor was found to not be
certified to run in a multiprocessor mode.
See Code 1, sub-code 0x3 for resolution information.
Fatal error: Code 1, subcode 0x5 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: 3PAR has not certified this CPU as a multiple
processor.
See Code 1, sub-code 0x3 for resolution information.
Table Continued
6
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 1, subcode 0x6 (0)
Description
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Microcode table size is xxxx,
which is not 4 mod 2048.
This is an internal CBIOS consistency check error. If you
see this error, most likely processor execution out of
flash is not stable. The CPU identification is performed
after the flash is fully CRC verified, so this error is
likely the result of a failing CPU or transient bus
operation.
Resolution: A) Replace the processor.
B) Re-flash the CBIOS (no need to upgrade).
Diagnostic: A) Use Arium and scope to watch processor
fetches from flash trigger no unusual bus
operations.
Fatal error: Code 1, subcode 0x7 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Invalid microcode checksum: xxxx
This is another internal CBIOS consistency check error.
Before each block of update microcode is uploaded to the
Pentium, a checksum on it is first verified. If this
checksum is not valid, the block will be rejected with
this error.
See Code 1, sub-code 0x3 for resolution information.
Fatal error: Code 1, subcode 0x8 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Microcode update failed: expected xxxx, got yyyy
The processor has rejected the microcode update. This
could be any number of things, but is likely due to a
failing processor. At this point a strong 64-bit CRC
has been run successfully across the BIOS and a checksum
for each update line has also passed.
See Code 1, sub-code 0x4 for resolution information.
Fatal error: Code 1, subcode 0x9 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: No microcode update found for this CPU.
The BIOS was not able to locate a microcode update for
this particular processor, yet it is listed as a CPU which
requires a microcode update. This is likely due to use of
an unqualified processor.
See Code 1, sub-code 0x4 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
7
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 1, subcode 0xa (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: CPU failed BIST (built in self test): xxxxxxxx
The processor has failed its own built in self test.
This indicates strongly that the processor is at fault.
Resolution: A) Replace the processor.
B) Replace both processor VRM modules.
C) Replace the node motherboard.
Fatal error: Code 1, subcode 0xb (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: First CPU's bus ratios were wwww:xxxx
but this CPU's bus ratios are yyyy:zzzz.
The two processors in the system board do not have the
same bus clock multiplier. The likely cause is that
the processors are of different clock speeds (or less
likely minor steppings). The "First CPU" as written
above is the bootstrap CPU. On a PIII board, the bootstrap
CPU (CPU3) is to the right, nearest the PromJet interface.
Resolution: A) Remove both heatsinks and verify the
processors are rated for the same clock
speed and bus multiplier.
B) Replace each processor individually.
C) Replace the node motherboard.
Diagnostic: A) Use Whack "cpu id" command.
If you enter Whack before Linux is booted,
you will consistently run on the bootstrap
CPU. If you enter Whack from Linux (using
the whack command), it is a race as to on
which CPU you will enter Whack. The SMI
output indicates on which CPU whack is
running. Using this method, or using the
"cpu switch" command, you can "cpu id" all
processors in the node. Example:
Whack> cpu id
Intel(r) Pentium(r) III Processor:
Family 6, Model 8, Stepping 3, ...
...
CPUID[3] == 0x00000000 0x00000000 0xda28203c ...
...
Bus to CPU ratio == 2/13
...
Clock Frequency Ratio == 7
Table Continued
8
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 1, subcode 0xc (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
This CPU does not support clock multiplier changes
In the supported configuration, the two CPUs present in the
node must run at the same clock speed. If the BIOS detects
CPUs which have different clock multipliers, it will
automatically configure all CPUs to use the highest common
clock multiplier. If a CPU's multiplier cannot be changed,
then this fatal error will result.
See Code 1, sub-code 0x4 for resolution information.
Fatal error: Code 1, subcode 0xd (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
Desired clock multiplier xx is too high for this CPU
This error indicates the CPU does not support a clock
multiplier the BIOS is attempting to set.
See Code 1, sub-code 0xc for resolution information.
Fatal error: Code 1, subcode 0xe (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
Desired clock multiplier xx is illegal for this CPU
See Code 1, sub-code 0xd for information on this error.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
9
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 2,
sub-code 0x1 (0)
RTC_FAILURE "RTC Failure"
*** Error: Real-Time Clock not initialized.
The Real-Time Clock (RTC) is a function of the SuperIO
which provides a battery backed system clock and a small
quantity of battery backed Non-Volatile RAM for system
configuration flags. This error indicates the RTC memory
has become corrupt, possibly due to a dead battery or
battery removal when no mainline power was available.
Resolution: A) Power down, wait 30 seconds, power up.
This error should self-correct (likely
with a loss of current date/time and
other NVRAM contents). Set the date and
time using the Whack "rtc date" command.
B) Replace the RTC battery, located near the
SuperIO ASIC.
C) Use the Whack command "rtc date" to set
the RTC date and time.
D) Replace the node motherboard.
Diagnostic: A) Use Whack "time loop" command. The left
column is RTC seconds and should increment
exactly at second intervals. The right
column is a time scaled processor
performance counter and should (even in
the case of a deviant slow or fast RTC)
still increment nearly in lock step with
the RTC.
B) Verify there is not a dead short across
the RTC battery. This will drain the
battery and immediately invalidate the
Non-fatal error: Code 2,
sub-code 0x2 (0)
RTC_FAILURE "RTC Failure"
RTC_BATTERY_LOW
RTC / NVRAM Battery Failure - Replace battery.
The RTC / NVRAM battery was found to have a low voltage by
the built-in monitoring circuit of the Real Time Clock
(RTC).
The RTC battery provides power to the RTC clock function
of the SuperIO while the board is not drawing mainline
supply power. Over time, this battery's available power
will decay (rated for over five years normal operation).
Resolution: A) Replace the RTC lithium cell battery on the
node motherboard.
B) Replace the node motherboard.
Diagnostic: A) Verify the lithium cell has a 3V charge.
B) Verify there is not a dead short across
the RTC battery. This will rapidly drain the
battery and immediately invalidate the
RTC contents on power down.
Table Continued
10
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 2,
sub-code 0x3 (0)
Description
RTC_FAILURE "RTC Failure"
RTC_INVALID_TIME
The current RTC date/time is invalid. Enter the correct
date/time or press Tab to acquire it from the network.
If the time has not yet been set, or becomes invalid due to
loss of battery power, this BIOS will report this error and
wait for the user to update the time.
Resolution: A) Enter the correct time.
B) Press TAB to acquire the time from the network.
C) Press ^C to abort prompt and resume boot.
Non-fatal error: Code 2,
sub-code 0x4 (0)
RTC_FAILURE "RTC Failure"
RTC_BATTERY_LOW
RTC / NVRAM Battery Failure - Replace battery.
The RTC / NVRAM battery was found to have a low voltage by
the
built-in monitoring circuit of the RTC (TOD clock).
Resolution: A) Replace the lithium-ion cell battery on the
node.
B) Replace the node motherboard.
Non-fatal error: Code 2,
sub-code 0x5 (mode)
RTC_FAILURE "RTC Failure"
RTC was found in Binary mode or 12 Hour mode.
The RTC has two modes of operation. The BIOS prefers the
RTC
to be in BCD mode rather than Binary mode.
If the RTC is in Binary mode, then this must have been set
by the OS.
The BIOS will reset the RTC to BCD mode.
Also, if the RTC
is
in 12 Hour mode, then the BIOS will report this and
correct
the RTC to 24 Hour mode.
The mode byte tells us which
mode
it was in: Bit 1 should be on for 24 Hour mode.
Bit 2 should be off for BCD mode.
Resolution: A) Informational only. If in Development,
then need to alert the development team.
Non-fatal error: Code 2,
sub-code 0x6 (0)
RTC_FAILURE "RTC Failure"
RTC update was stopped.
Resolution: A) Informational only.
then
need to alert the development team.
If in Development,
Table Continued
Error codes—HPE 3PAR OS 3.3.1
11
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 3, subcode 0x0 (0)
Description
SRAM_INIT_FAILURE"CPU SRAM Init Failure"
During initialization, memory areas are tested before
they are used.SRAM is used by the processor for
persistent storage during early initialization and the
CPU memory tests.
This sub-code indicates that the SRAM walking bits test
has failed and that the onboard SRAM may not be reliable.
Resolution: A) Power down, wait 30 seconds, power up.
This problem is likely not a one time
occurrence, so this problem is likely
to recur.
B) Replace the node motherboard.
Diagnostic: A) Use Arium to set and verify SRAM contents.
If you notice a pattern, it could be a
pulled, stuck, or bridged SRAM line.
Fatal error: Code 3, subcode 0x1 (0)
SRAM_INIT_FAILURE"CPU SRAM Init Failure"
After SRAM contents have been updated with the BIOS static
data, a test is performed to ensure the data arrived intact.
If it did not, this error is generated. The error could
indicate an SRAM failure with the same conditions as above.
See Code 3, sub-code 0x0 for resolution information.
Table Continued
12
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 4, subcode 0x1 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
Pairvvvv DIMMwwww: (Jxxxx) Bad checksum. Got yyyy, SPD
said zzzz
*** Error: Bad SDRAM configuration.
The SDRAM DIMMs located on the motherboard are used for main
CPU memory and are critical to the proper operation of a
node. Even before the memory is thoroughly tested for
proper
operation, it must be configured to appear in CPUaddressable
space.Each DIMM has a small embedded serial EEPROM which
holds DIMM configuration information such as the number of
rows,
columns, and banks, as well as memory timing. If this
serial
EEPROM becomes corrupt, data stored in it regarding the DIMM
configuration cannot be trusted. So, this EEPROM also
contains
a checksum which the BIOS verifies is correct before
configuring
the DIMM. If this checksum does not match the checksum the
BIOS
computes across the DIMM, this error will result.
The minor code reported is the total count of errors for
the DIMM.
Resolution: A) Replace the defective CPU DIMM with an
identical one.
B) If an identical one is not available, replace
the CPU DIMM pair.
See Code 15 for more resolution information.
Diagnostic: A) The CPU DIMMs appear on the I2C bus at 3.a0
through 3.a6. Use the Whack "d i2c" command to
display the DIMM serial EEPROM contents to
determine if there is a pattern.
Example (DIMM 2):
Whack> d i2c 3.a4.0
See Code 15 for more resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
13
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 4, subcode 0x2 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
Pairww DIMMxx (yyyy): 'zzzz' read failed
*** Error: Bad SDRAM configuration.
Where zzzz is one of:
row address, column address, module rows, cas latency3,
refresh, banks, cas latency2, cas latency1, ras precharge,
act_to_rw, act_to_deact, ras cycle, write_to_deact,
density, frequency, or DIMM type.
This error indicates that a CPU memory DIMM was detected but
that the EEPROM present on the DIMM could not be reliably
read.
The read operation is done through I2C.
See Code 4 above for resolution information.
Fatal error: Code 4, subcode 0x4 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: 'ssss' in Pairtt DIMMuu (vvvv): ww != DIMMxx
(yyyy): zz
*** Error: Bad SDRAM configuration.
This error indicates the BIOS detected the CPU SDRAM DIMMs
in
the bank pair are of a different type.
Resolution: A) Ensure both DIMMs in the pair are identical.
Note that two DIMMs may have the same capacity
but have different number of rows, columns, or
banks. The DIMM configuration must exactly
match. If the DIMMs have the same manufacturer,
markings and capacity, they are probably identical.
See Code 15 for more resolution information.
Diagnostic: A) The EEPROM SPD information in each pair of
DIMMs
should be nearly identical.
See Code 4 above for more diagnostic information.
Fatal error: Code 4, subcode 0x8 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: Pairww DIMMxx (yyyy): bad refresh type zz
*** Error: Bad SDRAM configuration.
This error indicates the value the DIMM reports for refresh
is not valid (greater than the maximum refresh counter).
See Code 4 above for resolution information.
Table Continued
14
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 4, subcode 0x10 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: DIMM Pair wwww: **** Type not known ****
(rows xxxx, cols yyyy, banks zzzz)
*** Error: Bad SDRAM configuration.
This error indicates the values the DIMM reports
for rows, columns, and banks do not correspond to any
known configuration for a valid DIMM. It is possible
the DIMM EEPROM data has become corrupt or that the DIMM
is a higher capacity than what is currently supported.
See Code 4 above for resolution information.
Fatal error: Code 4, subcode 0x20 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: Unable to configure any DQS lines.
OR
*** Error: Unable to configure DQS lines for nibble x.
*** Error: Bad SDRAM configuration.
This is P4 only.
This error indicates that BIOS failed to
find
a set of acceptable DQS values for every or one nibble of
the DIMMs.
See Code 4 above for resolution information.
Fatal error: Code 4, subcode 0x100 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: ACT to DEACT of yy.yy clocks is > 6.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
Resolution: A) Replace CPU DIMMs with 3PAR-certified
products.
B) Replace the node motherboard.
C) If there is no other choice, override this error
with a BIOS variable, setting "mem_margin" to
the percentage outside margin. Example:
*** Error: ACT to RW of 3.06 clocks is > 3.00 (2%)
*** Error: Bad SDRAM configuration.
Fatal error: Code 4, subcode 0x0 (2)
Whack> set perm
mem_margin=2
Whack> reboot
Table Continued
Error codes—HPE 3PAR OS 3.3.1
15
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 4, subcode 0x200 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: Act to RW of y.yy clocks is > 3.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 4, subcode 0x400 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: RAS precharge time of y.yy clocks is > 3.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 4, subcode 0x800 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: RAS cycle time of y.yy clocks is > 9.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting that is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 4, subcode 0x1000 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: RAS to RAS of y.yy clocks is > 2.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting that is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Table Continued
16
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 4, subcode 0x2000 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: yyyy: Write to deact > 3. We got zzzz
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting that is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 5, subcode 0x1 (0)
C_MAIN1_CALL_FAILURE "c_main1 Call Failure"
This exception should never happen unless an earlier
exception was ignored by pressing ^C. This is because
this exception will only occur if the main initialization,
diagnostic test, and boot sequence fails to complete a
boot and then the user chooses to ignore the error.
A further explanation is necessary. There are two halves
to system initialization. The first half relies on only
SRAM being available and so stack and runtime variables
are stored there. Once main CPU memory has been tested,
initialization switches to the second half which relies on
the tested SDRAM for all data structures. This second
half completes initialization and testing of all other node
board devices and executes the boot process. For this
last step to fail, the IDE disk must either not be present
or contains an invalid boot. At that point a fatal error
is generated.
Do not ignore this condition. It is a final recourse and
an abort will reboot or hang the node board. It is safer
at this stage to press ^W and enter Whack.From Whack,
you can reboot with the "reboot" command.
Resolution: A) Check control cache (CPU) DIMMs are installed
and pass initialization.
B) Verify the node boot drive is present and node
software has been installed.
C) Replace the node, including CPU DIMMs and boot drive.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
17
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 6, subcode 0x1 (0)
Description
SRAM_BAD "CPU SRAM Bad"
*** SRAM failure: address xxxxxxxx wrote yy but read zz
This failure indicates an early SRAM verification test
revealed a problem with the SRAM. This is an unrecoverable
error which likely requires hardware diagnostic. This
error is displayed by low level init code.It will never
be written to the PROM log because hardware which writes to
the PROM relies on correctly functioning SRAM.
Resolution: A) Cycle power on the node.
B) Replace the bootstrap CPU.
C) Replace the node motherboard.
Diagnostic: A) Use Arium to set and verify SRAM contents.
If you notice a pattern, it could be a
pulled, stuck, or bridged SRAM line.
Fatal error: Code 7, subcode xxxx (yyyy)
SDRAM_BUS_FAST "Control Cache Bus Fast"
*** Error: Front side bus speed xxxx > expected yyyy
This error indicates the BIOS has detected that the front
side
bus speed exceeds the expected speed (133 MHz on PIII, 533
MHz
on P4, 1333 MHz on 5000P).The system may not perform
reliably.
Resolution: A) Cycle power on the node.
B) Replace the bootstrap CPU.
C) Replace the node motherboard.
Diagnostic: A) Check the oscillator for the front side bus
with a frequency counter or an oscilloscope.
Fatal error: Code 8, subcode xxxxxxxx (yyyyyyyy)
MACHINE_CHECK_FAILURE"Machine Check Failure"
Machine check:
MCG_STATUS == xxxxxxxx yyyyyyyy
During BIOS initialization and testing, the processor must
execute instructions. If this error results at any point,
it is likely due to failing hardware related to the CPU's
instruction execution path.
Resolution: A) Cycle power on the node.
B) Update the node firmware to the latest version.
C) Replace CPU SDRAM in pairs.
D) Replace the node motherboard.
Diagnostic: A) Replace CPU VRMs.
B) Replace CPUs.
C) Use Arium and set a breakpoint on a machine
check to determine what errant instructions
led up to the machine check.
D) This problem may also be a BIOS or booter
software bug. Observe the values of the error
sub-code and data. They make up the 64-bit
value of the MCG_STATUS status register.
Table Continued
18
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 9, subcode 0x0 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
*** Entering memory segment test: Stack is in xxx ***
One of the first memory tests performed in diagnostic
mode is a sequential address or random data test.
If there is no memory in the system, or the memory DIMMs
are mismatched, or there is a memory subsystem problem,
this error may result.
Resolution:
A) Verify memory is installed and in matched
pairs (same manufacturer, exact same memory
configuration and speed).
B) Replace CPU DIMMs with a set of known good ones.
C) Replace the node motherboard.
Diagnostic: A) Change memory with Whack "c <addr>" command.
Examine memory with Whack "d <addr>" command.
B) Use Arium to modify and examine memory.
Fatal error: Code 9, subcode 0x1 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Insufficient memory: BSS end == xxxx, stack limit == yyyy
During the first part of initialization, system stack comes
from SRAM.The second part of initialization, system stack
comes from CPU memory.If there is insufficient SDRAM
(such as no DIMMs installed) this error may result.
It is a bad idea to ignore this error with ^C as the system
stack will fall past the available memory and probably hang
hard the initialization.
See Code 9, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
19
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 9, subcode 0x2 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Expected sdram_init_test to be xxxx, but it was yyyy.
After SDRAM has been initialized and scrubbed, the BIOS
copies runtime variables from Flash to CPU memory.
The fact this data is copied to SDRAM is later verified.
This fatal error may be caused by either a software
error in the BIOS, a hardware error (such as flaky CPU
memory), or user intervention such as modifying the memory
containing the SDRAM copy of the runtime variables.
Resolution:
A) Reboot.If the problem is caused by flaky
hardware, a prior memory test should catch
this condition.
B) Upgrade BIOS version. Not a likely solution
since this code path is well tested every
time the system is booted.
C) Replace CPU DIMMs with a set of known good ones.
D) Replace the node motherboard.
Diagnostic: A) Examine the BIOS memory area using the Whack
"d <addr>" memory dump command.SDRAM data
appears in CPU memory in the 0x000d0000
region.The key value is 0xdeadbeef.
Example:
Whack> mem search d0000 10000000 deadbeef
Searching 00000000 .. 01000000 for deadbeef
[ ]
Found at 000d0cb0
If this key cannot be found, something went wrong
with the copy or memory has become corrupt.
Table Continued
20
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 9, subcode 0x3 (0)
Description
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Low 1M test: Test completed: x iterations, y probes, z
errors found
The low 1 MB of memory is thoroughly tested to ensure
reliable
operation as this is the memory area that the BIOS and Whack
use during further initialization and testing.If this test
fails, it should not be ignored with ^C as having reliable
system memory is critical to proper operation.
Resolution: A) Cycle power on the node. Occasionally, memory
will fail during a memory test due to metallic
dust.
B) Reseat CPU memory DIMMs.
C) Pull CPU DIMMs, blow dust from sockets, reseat.
D) Replace CPU memory DIMMs in pairs to ensure
replacement parts are matched.
PIII nodes: Non-paired DIMMs are proximally
closest. Paired DIMMs are the leftmostleftmost and rightmost-rightmost of each
two which are proximally closest.
P4 nodes: Paired DIMMs are proximally closest.
DIMM0 and DIMM1 are a pair.
DIMM2 and DIMM3 are a pair.
E200, Ironman, and Tinman nodes:
There is only a single pair of CPU
memory DIMMs.
E) Replace the node motherboard.
Diagnostic: A) Run the memory test manually from Whack. You
can use the "mem test range <base> <size>" command to
test a range of memory.
B) Write to known bad memory with the Whack
"c <addr>" command and observe written contents
with "d <addr>" Write enough patterns that you
might be able to observe a pattern such as stuck
or floating bit.
Fatal error: Code 9, subcode 0x4 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
High 64K test: Test completed: x iterations, y probes, z
errors found
In addition to the low 1 MB of memory, older BIOS versions
also thoroughly tested the high 64 KB of memory. This is
because the operational stack for the CBIOS and Whack used
to
reside at this address, which made the memory critical for
proper initialization and testing.The current BIOS now uses
memory below 1 MB for stack space, so this failure code is
deprecated.
See Code 9, sub-code 0x3 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
21
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 9, subcode 0x5 (0)
Description
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
SDRAM walk: Test completed: xx iterations, yy probes, zz
errors found
During initialization (prior to a thorough test of the low
1 MB of memory), a quick walk through all CPU memory is
performed.If an error is found, this fatal error is
displayed.
See Code 9, sub-code 0x3 for resolution information.
Fatal error: Code 9, subcode 0x6 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Full SDRAM test: Test completed: xx iterations, yy probes,
zz errors found
During later testing, a full SDRAM test is performed which
more completely verifies proper memory operation than the
cursory SDRAM walk. This test is very similar to the initial
thorough 1 MB test done during initialization.
See Code 9, sub-code 0x3 for resolution information.
Fatal error: Code 9, subcode 0x7 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Pairwwww DIMMxxxx: Illegal SPD <name of value> <value>
This error indicates that a CPU DIMM was detected but that
the EEPROM present on the DIMM reported an illegal or
unsupported value for our memory controller.
Example:
Density (SPD byte 31) has more than 1 bit set (ie. 0x30)
which indicates a non-standard part.
See Code 9, sub-code 0x3 for resolution information. Most
likely, the DIMM is not qualified for use in our Node Board.
The DIMM number is logged in the Data field of the Fatal
Error.
Table Continued
22
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 9, subcode 0x10 (0)
Description
SDRAM_FAILURE"Control Cache Failure"
Cannot allocate xx bytes for PCI bus yy scan
or
Cannot allocate xx bytes for PCI device on bus yy
This error indicates there was not enough memory or a memory
error occurred while attempting to allocate heap space
during
the PCI device probe. SDRAM is needed because the BIOS
maintains a list of PCI devices present in the system.
Resolution: A) Cycle power on the node.
B) Remove all PCI cards.
C) Replace CPU DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Set BIOS verbose init flags to get more info
during memory init and PCI scan.
Whack> set perm mem_verbose
Whack> set perm pci_all
B) Use the "config->heap" command to show the
heap_base, heap_top, and heap_limit values.
Fatal error: Code 9, subcode 0x11 (0)
SDRAM_FAILURE"Control Cache Failure"
Cannot find bus xx in scanned PCI busses
During the PCI bus scan, a list of PCI devices present is
recorded in SDRAM.For each device present, a block of
memory is allocated and initialized. This error indicates
that a data value indicating bus number could not be found
in the list of devices previously scanned.This is probably
due to an SDRAM or CPU failure.
Resolution: A) Cycle power on the node.
B) Remove all PCI cards.
C) Replace CPU DIMMs.
D) Replace bootstrap CPU.
E) Replace the node motherboard.
Fatal error: Code 9, subcode 0x12 (0)
SDRAM_FAILURE"Control Cache Failure"
No memory installed.
This error indicates that the CPU memory scan failed to
locate any usable memory for the system. There must be
at least one bank of SDRAM configured for the node to
operate correctly.
Resolution: A) Cycle power on the node.
B) Verify CPU DIMM scan output shows DIMMs.
C) Replace CPU DIMMs.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
23
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 9, subcode 0x13 (xxxx)
SDRAM_FAILURE "Control Cache Failure"
Unknown DDR2 frequency (xxxx)
This error indicates that the CPU memory installed is
of an unrecognized and thus unsupported memory speed.
Supported speeds include 533, 667 and 800 MHz.
Resolution: Replace CPU DIMMs with 533, 667 or 800 MHz
modules.
Fatal error: Code 9, subcode 0x14 (0)
SDRAM_FAILURE "Control Cache Failure"
FB-DIMM Initialization Failure
This error indicates that CBIOS was unable to initialize
the CPU memory installed.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs.
C) Replace the node motherboard.
Fatal error: Code 9, subcode 0x15 (data)
SDRAM_FAILURE
"Control Cache Failure"
This error indicates that an uncorrectable ECC error was
detected on a DIMM. The data value is a bitmask that may be
decoded
to determine which DIMM had the error. A value of 1
indicates
DIMM 0, 2 indicates DIMM 1, 4 -> DIMM 2, etc. More than one
bit may be set if CBIOS is unable to isolate the error down
to a single DIMM.
Resolution: A) Cycle power on the node.
B) Replace FB-DIMM(s).
C) Replace the node motherboard.
Fatal error: Code 10, subcode 0x1 (0)
PCI_FAILURE "PCI Failure"
*** Error: Bus xx cannot be parent of bus yy.
*** Error: Failure occurred during PCI device allocation.
During the PCI scan, many devices which were programmed by
previous PCI scan steps are examined again to verify the
programming was successful. This error indicates that a
bridge failed to record the PCI bus number of bridges
below it.
Resolution: A) Cycle power on the node.
B) Remove all PCI cards.
C) Replace the node motherboard.
Diagnostic: A) Use Whack to evaluate offset 0x45 on the
failing
parent bridge to determine if the value isn't
sticking there or there is some problem with
the PCI bus below it.
Table Continued
24
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x2 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: Vendor vvvv, Device wwww, for index xxxx:
Expected size yyyy, but got zzzz
There are on the PCI bus several devices in a node board
which are known by the CBIOS to have specific sizes. As a
hardware consistency check, the BIOS verifies that these
devices are not only present, but also have appropriate
memory and I/O space requirements.If any device is found
outside of expected requirements, it will cause this error.
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Swap out the PCI card for another qualified
card (if it's a card).
D) Pull all PCI cards to see if the problem
persists. If so, replace any defective cards.
E) Replace the node motherboard.
Diagnostic: A) Use Whack command "pci probe" using the
vendor ID provided in the fatal error to
acquire the address information the card
provides. If this information does not match
the error above, this may be a transient.
B) Use the Whack "d pci" command, providing it
the "<bus>.<dev>.<func>" of the PCI device.
Look for patterns in the data that might
indicate a stuck bit.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
25
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 10, subcode 0x3 (0)
PCI_FAILURE "PCI Failure"
*** Error: I/O space: address limit xxxx exceeded: yyyy
This error indicates that the system has run out of
available
mapping area while attempting to map this device into the
CPU's
I/O address range (0x0000 - 0xfe00). The likely cause of
this
error is that a prior PCI device is consuming too much I/O
space.Since most device I/O ranges are extremely small, it
is likely a defective PCI card or PCI bus problem which is
the cause.
Resolution: A) Reseat all PCI cards.
B) Swap out individual PCI cards.
C) Replace the node motherboard.
Diagnostic: A) Use Whack command "pci init" or "pci scan" to
re-scan the bus. It may provide the information
you need to determine the bad device.
B) Review the prior PCI allocations to determine
one which is unusually large. You will need
to enter diagnostic mode to do this. There
are two ways:
1) Press ESC at the initial memory test. Type
"go" at the Whack prompt. Answer 'y' to run
the PCI initialization. Answer 'a' to print
on all phases.
2) Press ^W at the initial memory test.Type
"config diag" at the Whack prompt. Answer 'y'
to run the PCI initialization. Answer 'a' to
print on all phases.
Fatal error: Code 10, subcode 0x4 (0)
PCI_FAILURE "PCI Failure"
*** Error: 32-bit prefetchable memory: address limit xx
exceeded: yy
Many PCI devices (and software drivers) require DMA
addressable
memory within the 32 bit address space (less than 4 GB).
For
this reason, all 32 bit PCI devices are required to be
mapped
within this space.Currently, all CPU memory is also forced
to
be mapped within this space, limiting the maximum 32-bit CPU
memory to about 3 GB.
Resolution: A) Swap out individual PCI cards.
B) Replace the node motherboard.
See Code 10, sub-code 0x3 for diagnostic information.
Table Continued
26
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x5 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: 32-bit non-prefetchable memory: address limit xx
exceeded: yy
The non-prefetchable memory has the same 32 bit limitations
as
prefetchable memory does.
See Code 10, sub-code 0x4 for resolution information.
Fatal error: Code 10, subcode 0x6 (0)
PCI_FAILURE "PCI Failure"
*** Error: 64-bit prefetchable memory: address limit xxxx
exceeded: yyyy
64 bit PCI devices are not limited to a 32 bit address
space.
The CPU, however, can only access a 36 bit space (when
virtual
memory is enabled). Because most drivers need direct access
to
the memory a device provides on the bus, the device must be
addressable by the Pentium and so the maximum 64 bit address
allowed is 0xf:ffffffff. This is 64 GB.
See Code 10, sub-code 0x4 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
27
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 10, subcode 0x7 (0)
PCI_FAILURE "PCI Failure"
Testing CM PCI 64-bit data lines: FAIL
The Cluster Manager (Eagle / Osprey) is used to perform a
walking
bit test on both PCI0 and PCI1 data paths to CPU memory.
If a
problem is found, with either path, this error will be
displayed.
The error will be further qualified by one of the following
prior
lines:
PCIxxxx
PCIxxxx
BitZZ
PCIxxxx
PCIxxxx
BitZZ
PCIxxxx
BitZZ
all data bits stuck high
found data bits stuck high: BitWW, BitXX, BitYY,
all data bits stuck low
found data bits stuck low: BitWW, BitXX, BitYY,
data bits possibly floating: BitWW, BitXX, BitYY,
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Pull all PCI cards to see if the problem
persists. If so, replace any defective cards.
D) Replace the node motherboard.
Diagnostic: A) Depending on the specific error above, check
for
stuck or floating pins on CM's connection to
the appropriate PCI bus.
B) Depending on the specific error above, check for
stuck or floating pins on CIOB's (RCC South Bridge)
connection to the appropriate PCI bus.
Table Continued
28
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x8 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: Miscompare CPU Memory to CM
Expected (0xAAAAAAAA)
Actual
(0xBBBBBBBB)
Offset
(0xCCCCCCCC)
*** Error: Miscompare CM to CPU Memory
Expected (0xAAAAAAAA)
Actual
(0xBBBBBBBB)
Offset
(0xCCCCCCCC)
CBIOS runs simple CM PCI Tests as part of POST in both
normal operation and manufacturing test. The tests use XCBs
to transfer data over both CM PCI interfaces from Cluster
Memory
to CPU Memory and back. If any test fails due to a data
miscompare, the test will generate this fatal error code
with
sub-code '0x4'.
These tests are similar to the Cluster Memory Tests and may
fail
due to Cluster Memory SDRAM hardware or CPU SDRAM hardware
failures. Any test failure will result in a fatal error.
Resolution: A) Cycle power on the node.
B) Reseat CM memory riser card.
C) Reseat the failing Cluster memory DIMM.
D) Replace the failing Cluster memory DIMM.
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the CMA
register set which is mapped into CPU memory for
access.Use the Whack "pci probe mem 1590"
command to find the Cluster Manager on the PCI bus.
The base address in CPU memory for the configuration
and status registers (CSRs) is Window 0. Example:
Whack> pci probe mem 1590
Win Baseaddr Basesize Identity
[0] 00:90200000 00:00000400 3PAR (ASIC) LPC#
[1] 00:20000000 00:20000000
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x90200000 above).
This is the base address of the Cluster Memory
Control Register Block.Refer to the Scaffold
System Architecture Reference for information on
register programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
29
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 10, subcode 0x9 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI bridge has dead clock: xxxxxxxx
This error indicates one of the PCI bridges on the board has
a bad clock value and is refusing to accept programming of a
good clock.
Resolution: A) Cycle power on the node. The problem may
occur on power cycle (only) with random
chance on a bad board.
B) Pull all PCI cards which have integrated
bridges (QLogic quad port cards are a good
example of this). You should power cycle
several times to determine it is not an
intermittent problem with the motherboard.
C) Replace the node motherboard.
Diagnostic: A) The PCI output just prior to the fatal error
will indicate which of the four bridges has
failed.It will be text similar to
"Bridge #1 (controls slots 4 & 5)."Refer
to rework documentation to correct this
problem.
Fatal error: Code 10, subcode 0xa (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI bridge has bad GPIO clock select inputs: x
This error indicates one of the PCI bridges on the board has
a bad GPIO input which selects bridge clock sources on a
power
on condition.
Resolution: A) Cycle power on the node. The problem may
occur on power cycle (only) with random
chance on a bad board.
B) Replace the node motherboard.
Diagnostic: A) The PCI output just prior to the fatal error
will indicate which of the four bridges has
failed.It will be text similar to
"Bridge #1 (controls slots 4 & 5)."Verify
that GPIO lines 0-3 are being properly pulled
high by comparing against known good board.
Table Continued
30
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0xb (0)
Description
PCI_FAILURE "PCI Failure"
Warning: This node has xx PCI cards present, but yy is the
required minimum.Please verify your node is properly
configured. You may adjust the required minimum with
the "set pci_min" command.
This error indicates this node has detected less PCI cards
than the recommended 3PAR minimum.In a system configuration
where there are less than the minimum active PCI cards,
inactive load cards should be used to reach the required
minimum.
Resolution: A) Verify the minimum required number of PCI
cards
are inserted in the node. Install dummy load
cards to reach the required minimum.
B) Verify all PCI cards in the system have been
identified.Replace any missing card.
C) Replace the node motherboard.
Diagnostic: A) Isolate the problem to one or more slots by
placing load cards in all slots, and then
using the "i2c vsc" command to find which
slots do not report a load.
B) You can use the "i2c vsc" command to verify
cards are reporting correct wattages. You can
use the "pci probe" command to display all PCI
devices, and locate which slot in which they
are inserted. Replace any defective card.
Fatal error: Code 10, subcode 0xc (0)
PCI_FAILURE "PCI Failure"
Testing CM PCI 64-bit address lines: FAIL
CM XCB TEST miscompare at offset, uuuu
Expected (vvvvvvvv)
Actual
(wwwwwwww)
CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz)
The Cluster Manager is used to perform a walking bit test on
both PCI0 and PCI1 address lines paths from CPU memory into
cluster memory. If a problem is found (with either path),
this error will be displayed. The particular memory address
which caused this error will be indicated.
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Pull all PCI cards to see if the problem
persists. If so, replace any defective cards.
D) Replace the node motherboard.
Diagnostic: A) Depending on the specific error above, check
for
stuck or floating pins on the Cluster Manager's
connection to the appropriate PCI bus.
B) Depending on the specific error above, check for
stuck or floating pins on CIOB's (RCC South Bridge)
connection to the appropriate PCI bus.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
31
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0xd (zz)
Description
PCI_FAILURE "PCI Failure"
*** Vendor xxxx device yyyy on motherboard not yet
qualified.
*** Vendor xxxx device yyyy in slot zz not yet qualified.
This is an error indicating that the device found is not
recognized by the BIOS as a 3PAR-qualified device.This may
be because the board is a new generation or that there was a
PCI error in communicating with the device. In the former
case, it is probably safe to press ^C to ignore this error.
In the later case, it is possible that part of the board has
become non-functional to where the BIOS may not be able to
determine if the rest of the board will continue to
function.
If you need to override this feature, enter Whack at this
point by pressing ^W. Enter the following command:
Whack> set perm pci_unqual_ok
If the data field is non-zero, it indicates the BIOS
discovered the problem is a card in a particular PCI slot.
The specific codes are as follows:
* 30 is PCI Slot 0
* 31 is PCI Slot 1
* 32 is PCI Slot 2
* 33 is PCI Slot 3
* 34 is PCI Slot 4
* 35 is PCI Slot 5
Resolution: A) Swap out the PCI card for a qualified card.
B) Replace the node motherboard.
Diagnostic: A) If the card is a QLogic, use the Whack
command
"pci probe 1077" to find the device and display
its device ID. You may need to press ^W first
if the BIOS is still at the fatal error. There
are several currently qualified PCI cards. Some
include the QLogic 2200, 2300, and 2312. More
will be qualified in the future.
B) The PCI probe should have shown the bus.dev.func
specifier you need to display card information
directly using Whack. Use the Whack "d pci"
command giving it the "<bus.<dev>.<func>" as
a parameter. You should see a standard PCI
header present.
C) Try the same or a different card in a different
PCI slot to see if the slot has failed.
Table Continued
32
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0xe (0)
Description
PCI_FAILURE "PCI Failure"
PCI bus scan and allocation completed in 21 passes.
*** Error: PCI scan required too many passes. Bad PCI
interaction.
This error indicates the PCI scanning code was unable to
lay out a valid PCI address table mapping within 21 passes.
The cause of this error is possibly due to either defective
hardware or BIOS firmware.
Resolution: A) Remove all PCI cards. If error goes away,
attempt to find failed card by process of
elimination (put back half of the cards and
try to boot again).
B) Replace the node motherboard.
Diagnostic: A) Observe other errors that may happen at the
same time as this error. Is there and indication
that it is a board ASIC which is failing? In
general, some other error should trigger before
this one, since device limits are verified.
B) Contact BIOS engineer for debug assistance.
Fatal error: Code 10, subcode 0x10 (0)
PCI_FAILURE "PCI Failure"
*** Error: IMB.A isn't turned on
This error indicates a possible hardware failure on the
board.
The bus which connects the CMIC (P4 North Bridge) to CIOB A
failed to initialize properly.
Resolution: A) Cycle power on the node. The problem may
occur with random chance on a bad board.
B) Replace the node motherboard.
Diagnostic: A) Verify CIOBX2 is receiving a valid clock.
B) Look at PCI device 0.0.2.f8 for CIOB A, or PCI
device 0.0.1.f8 for CIOB B. The BIOS observes
bit 0 of this register to tell if the IMB
initialized (0 indicates success).
Table Continued
Error codes—HPE 3PAR OS 3.3.1
33
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x11 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: Expected to see device xx.yy.zz `uuuu'
but it is not responding: Vendor vvvv, Device wwww.
*** Error: Failure occurred during PCI device allocation.
The BIOS checks for specific onboard PCI devices (such as
bridges) which are known to be on a particular node board.
If a device listed in the BIOS table is not found on the
board, then this error will result.
Resolution: A) Cycle power on the node.
B) Remove PCI cards and see if error disappears.
C) Replace the node motherboard.
Diagnostic: A) The error should indicate for you which
device
is missing. Observe to see if there is another
unknown onboard device which has appeared in
its place.This could be the device, masked
behind a PCI bus problem.
B) Verify the PCI ASIC is functional by checking
clocks and PCI data lines to the device.
Fatal error: Code 10, subcode 0x12 (0)
PCI_FAILURE "PCI Failure"
*** Error: The following device is not listed in the
hardwired PCI descriptor table:
Vendor xxxx, Device yyyy
*** Error: Failure occurred during PCI device allocation.
Onboard PCI devices (such as bridges) are well known by
the BIOS to appear at specific bus addresses. If this
device is not known by the BIOS, but it is configured on
a bus which is not externally exposed (PCI slot), then
you will see this error. Since the node board is a closed
solution, this error might occur if an on board device is
failing and does not report a correct device vendor/ID, or
corrupts the device vendor/ID reported by another device on
the bus.
See Code 10, sub-code 0x11 for resolution information.
Fatal error: Code 10, subcode 0x13 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Was yyyy but is now zzzz
*** Error: Failure occurred during PCI device allocation.
The PCI header is re-read on multiple passes of the PCI
initialization. If a mismatch is found with a previous read
of the PCI bus, then this error will result. This is a
strong indicator of a flaky device or bus.If the BIOS is
in Diagnostic mode (press ESC at the initial memory test),
at this point, the following will also be displayed:
Starting infinite PCI read loop...
In Diagnostic mode, once a failure is detected, this test is
then repeated until manual intervention.
See Code 10, sub-code 0x3 for resolution information.
Table Continued
34
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x14 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Invalid 64-bit size: yyyy
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, a 64 bit window was found on the
PCI bus which is outside the 36 bit range imposed by the
CPU.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x15 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Allocation size is zero
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, a window was found on the PCI
device with a size of zero. This fatal error may indicate
that the BIOS is not able to properly communicate with the
PCI device.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x16 (slot)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, each memory or I/O window present
on each device found on the bus is programmed with a CPU
memory
bus address so that it may be accessed by further BIOS
initialization, tests and of course the main operating
system.
The BIOS verifies the address it programs for each window
was correctly programmed (by reading back the value just
written). If they do not match, this error is generated.
The slot number is an ASCII value represented as
Hexadecimal.
If the slot value is 0, then the failure occurred on a node
motherboard device. If PCI Slot 0 was involved, then slot is
30. PCI Slot 1 is 31; PCI Slot 2 is 32; PCI Slot 6 is 36,
etc.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x17 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz
*** Error: Failure occurred during PCI device allocation.
See Code 10, sub-code 0x16 for information on this error.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
35
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x18 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz
*** Error: Failure occurred during PCI device allocation.
See Code 10, sub-code 0x16 for information on this error.
Fatal error: Code 10, subcode 0x19 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Invalid allocation size: yyyy
(Must be a power of 2)
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, each memory or I/O window present
on each device found on the bus is programmed with a CPU
memory
bus address. The size of the window require is provided by
the specific PCI device. It is required that this window is
a power of 2 in size (1 KB, 2 KB, 4 KB, ... 32 MB, 64 MB,
etc).
This is a consistency check the BIOS performs to ensure it
is properly communicating with the PCI device.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x1a (0)
PCI_FAILURE "PCI Failure"
*** Error: Device does not fit into address space, skipping:
attempted addr xxxx, size yyyy
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, the entire PCI bus is walked
as a tree and devices registers are initialized and mapped
into processor address space using this tree. The bus
structure is then ordered and summarized into a table so
that software can later find specific devices for high level
initialization. This specific error indicates the PCI scan
attempted to map a PCI device into the CPU's 32-bit address
space, but failed due to no more available space. Verify
that NVRAM flags such as "pci_base" and "mem_max" are not
set
to unusual values.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x1b (0)
PCI_FAILURE "PCI Failure"
*** Error: IMB.B isn't turned on
This error indicates a possible hardware failure on the
board.
The bus which connects the CMIC (P4 North Bridge) to CIOB B
failed to initialize properly.
See Code 10, sub-code 0x10 for resolution information.
Table Continued
36
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x1c (data)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCI CIOB Primary www MHz (xxx), Secondary yyy
MHz (zzz)
This error indicates a possible hardware failure on the
board.
The CIOB (which connects the North Bridge to the I/O system)
has an incorrect clock speed.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Eagle nodes should run CIOB at 66 MHz on
both the
primary and secondary sides. Review fatal error
output to determine if the primary side or
secondary side is affected.
B) Verify clock with scope. Check strapping resisters
which select CIOB bus clock speed on reset.
Fatal error: Code 10, subcode 0x1d (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI bridge has bad secondary speed: v.w.x.y =
zzzz
This error indicates one of the PCI bridges on the board has
a bad speed selection set, which could indicate an incorrect
type of PCI card has been installed or that bridge mode
select
strappings are bad.
Resolution: A) Pull all PCI cards one at a time to determine
failed card.
B) Replace the node motherboard.
Diagnostic: A) Check Intel 31154 mode select strapping
resistors to ensure PCI-X mode is selected.
Refer to Ironman rework instructions to
correct this.
B) PCI offset 0xf2 in the 31154 indicates, among
other things, the mode selected. Bits 6-8
should have the value 010 for proper operation
(100 MHz secondary PCI bus speed).
C) Some pre-production Ironman nodes have not been
reworked to correct this defect. To ignore
this error, set the "pci_speed_any" NVRAM flag
by pressing ^W to enter Whack and entering:
Whack> set perm pci_speed_any
Table Continued
Error codes—HPE 3PAR OS 3.3.1
37
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 10,
sub-code 0x1e (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCIe x.y.z: Invalid port configuration
strappings (xxx).
The indicated PLX switch chip has incorrect hardware
configuration
strappings.
Resolution: Replace the node motherboard.
Non-fatal error: Code 10,
PCI_FAILURE "PCI Failure"
sub-code 0x1f (YYYYYYxx) *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected
link width detected (xx).
This error indicates that the device found is not running
at the correct PCIe link width.
If the "xx" portion of the data field is non-zero,
it indicates a problem with a particular PCI slot.
The specific codes for "xx" are as follows:
30 is PCI Slot 0
31 is PCI Slot 1
32 is PCI Slot 2
33 is PCI Slot 3
34 is PCI Slot 4
35 is PCI Slot 5
36 is PCI Slot 6
37 is PCI Slot 7
38 is PCI Slot 8
To ignore this error, enter Whack by pressing ^W and
entering:
Whack> set perm pci_speed_any
Resolution: A) Replace indicated card (if "xx" is non-zero).
B) Replace node motherboard.
Non-fatal error: Code 10,
sub-code 0x20
(YYYYYYxx)
PCI_FAILURE "PCI Failure"
*** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected
link speed detected (xxx).
This error indicates that the device found is not running
at the correct PCIe link speed.
See Code 10, sub-code 0x1f for resolution information.
Table Continued
38
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 10,
sub-code 0x21 (xxx)
Description
PCI_FAILURE "PCI Failure"
*** Error: Slot xxx indicates no HBA present, but PCI
device found
This error indicates that a PCI device was found in a slot
which was expected to be empty. The likely cause of this
failure is an HBA which is not fully seated. If this is an
expected failure, you can set "pci_missing_ok" to override
this check.
Resolution: A) Reseat or replace the indicated HBA.
B) Replace node motherboard.
Non-fatal error: Code 10,
sub-code 0x22 (xxx)
PCI_FAILURE "PCI Failure"
*** Error: Slot xxx indicates HBA present, but no PCI
device found
This error indicates that no PCI device was found in a slot
which was expected to be populated (HBA present). The
likely
cause of this failure is an HBA which has failed. If this
is
an expected failure, you can set "pci_missing_ok" to
override
this check.
Resolution: A) Reseat or replace the indicated HBA.
B) Replace node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
39
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 10,
sub-code 0x23 (data)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCI device bb.dd.ff (slot ss) hung during
previous error scan
This error indicates that during a previous PCI scan, the
CPU
hung. The most probable cause of this error is a defective
HBA.
The data field provides several details about the suspect
device.
The low byte indicates which PCI slot, if known.
Value
0x30
corresponds to PCI Slot 0, 0x31 is PCI Slot 1, ..., 0x38 is
PCI Slot 8. Byte 2 and byte 1 correspond to the PCI
bus.dev.func.
Byte 3 indicates whether the failure occurred during a PCI
error
scan, and whether this is a repeat failure. Decode table
for data:
bits 0..7 PCI Slot (0x00=MB, 0x30..0x38=PCI Slot 0..8)
bits 8..10 PCI func
bits 12..15 PCI dev
bits 16..23 PCI bus
bits 24..28 Reserved (0)
bit 29 Repeat flag (1=repeat -- fatal error)
bit 30 Hang during (0=PCI scan, 1=PCI error scan)
bit 31 Reserved (1)
Example (data=c00a0a35):
The 0x35 value implicates PCI Slot 5.
The 0a0a value is bus.dev.func 0a.01.02.
The c0 value tells the hang occurred during a PCI error
scan.
Example (data=a0090831):
The 0x31 value implicates PCI Slot 1.
The 0908 value is bus.dev.func 09.01.00.
The a0 value indicates a repeated hang during the PCI scan.
Resolution: A) Replace HBA if PCI Slot is indicated.
B) Convert to PCI bus.dev.func and match with the
suspect PCI device from previous BIOS messages.
If this is an onboard device, replace the node
motherboard.
Table Continued
40
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x24 (data)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCI device bb.dd.ff (slot ss) hung during
previous scan
Hang occurred multiple times.
This error indicates that during a previous PCI scan, the
CPU
hung repeatedly. Other than this being a fatal error, this
code is identical to that of sub-code 0x23. Note that if
this fatal error is seen without a preceding non-fatal subcode
0x23, then the failure is likely to be the node motherboard.
If the non-fatal is not logged, then a PCI scan hung
earlier in
the PCI tree than a previous hang.Unless both hangs happened
on the same HBA, the cause is likely a shared device on the
node motherboard.
See Code 10, sub-code 0x23 for resolution information.
Fatal error: Code 10, subcode 0x25 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCIe bb.dd.ff: Serial EEPROM is not present.
This error indicates that the PCI device does not have
an EEPROM attached.
Resolution: Replace node motherboard.
Fatal error: Code 10, subcode 0x26 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCIe bb.dd.ff: Unable to write Serial EEPROM.
This error indicates that the EEPROM failed to be
programmed.
Resolution: Replace node motherboard.
Fatal error: Code 10, subcode 0x27 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCIe bb.dd.ff: Unable to read Serial EEPROM.
*** Error: PCIe bb.dd.ff: Serial EEPROM index XX value
0xXXXXXXXX !=
expected 0xXXXXXXXX.
This error indicates that BIOS was unable to verify
the EEPROM contents after programming or that the data was
successfully written but did not persist.
Resolution: Replace node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
41
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 10, subcode 0x30 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCIe b.d.f Link Width incorrect size. Found xx,
s/b yy
This error indicates that the device found is not running
at the correct PCIe link width. xx is actual PCIe link
width and yy is
the expected PCIe link width.
This error may be logged with some HBA cards with x4 PCIe
lanes.
To ignore this error, enter Whack by pressing ^W and
entering:
Whack> set perm pci_speed_any
Resolution: A) Ok to ignore if this is related to HBA card
with x4 PCIe lanes
B) Replace indicated card.
Fatal error: Code 10, subcode 0x31 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected
link width detected (xx).
This error indicates that Harrier2 ASIC device found is not
running
at the correct PCIe link width.
Resolution: A) Power cycle the node
B) Replace node motherboard.
Fatal error: Code 10, subcode 0x32 (0)
PCI_FAILURE
"PCI Failure"
*** Error: PCIe b.d.f indicates HBA present, but no PCI
device found
This error indicates that PCI device not found.
Resolution: A) Reseat card
B) Replace indicated card
Table Continued
42
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 11, subcode 0 (yyyy)
UNRECOVERABLE_TRAP "Unrecoverable Trap"
*** Error: CPU exception detected: Stopping execution.
The BIOS installs an interrupt handler to catch spurious
(unexpected)
interrupts and exceptions during initialization and testing
of the
node hardware.During initialization, the BIOS even tests to
verify
a generated interrupt is delivered correctly. This is a
serious
condition and should not be ignored by pressing ^C. The
specific
interrupt received is the sub-code displayed. The
interrupt number
will be less than 0x20.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Review previous output lines to determine
whether interrupts were just enabled (it
follows the CPU identification). You should
see a message:
--- This interrupt was expected
If this is not present, then most likely
the interrupt or exception occurred immediately
after being enabled.
B) Using Whack, you can manually enable and disable
interrupts with the "cpu interrupt enable"
and "cpu interrupt disable" commands. You can
also use the "cpu interrupt <num>" command to
generate an interrupt. If interrupts are
enabled, you should see a message upon generating
an interrupt. One of:
--- This interrupt was expected
or
*** Error: Expected interrupt xxxx but got yyyy
or
*** Error: CPU exception detected: Stopping execution.
The two former messages will only occur if the
BIOS is still expecting an interrupt to be
delivered. The later message will only be
displayed if the interrupt is numbered 0x20 or
higher.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
43
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 12, subcode 0x0 (0)
Description
UNEXPECTED_INTERRUPT"Unexpected Interrupt"
PIII or P4 node:
--- SMI: No known cause (# zz)
GPE status: yyyyyy, GPE input: zzzzzz
An SMI is a System Management Interrupt, and interrupt
generated by the node hardware for the BIOS to service a
particular failure. This error indicates the BIOS was
unable to determine the cause of the SMI delivered by
hardware.
See Code 11 for resolution information.
Fatal error: Code 12, subcode 0x0 (0)
UNEXPECTED_INTERRUPT"Unexpected Interrupt"
Ironman, Tinman, Titan, or Atlas nodes:
CPU0 SMI: Bootstrap
CPU0 SMI: Updating
CPU0 SMI: Updated
--- SMI: No known cause (# 1) on CPU6
SMSCS[0] = 0x00000000
...
ALT_GP_SMI_EN = 0xbfbf
ALT_GP_SMI_STS = 0x0000
TMP_STS= 0x00000000:88380000
TMP_INT= 0x00000000:00000001
This fatal error indicates the BIOS received an SMI, but
wasn't
able to determine which device caused the interrupt. In
this
example, the "Bootstrap," "Updating," and "Updated" messages
suggest the BIOS firmware was updated.
Resolution: A) Reboot the node.
B) Replace the node motherboard.
Fatal error: Code 12, subcode 0x1 (yyyy)
UNEXPECTED_INTERRUPT"Unexpected Interrupt"
*** Error: Expected interrupt xxxx but got yyyy
During initialization, the BIOS installs an interrupt
handler
to verify interrupts are delivered reliably. It then
generates
an expected interrupt.If an interrupt is delivered which is
not the same as the one expected, this error is displayed.
The interrupt number, yyyy, represents which interrupt
occurred.
See Code 11 for resolution information.
Table Continued
44
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 13, subcode 0x0 (yyyy)
INTERRUPT_FAILURE "Interrupt Failure"
*** Error: Interrupt 0x20 could not be generated.
or
*** Error: Interrupt 0xff could not be generated.
During initialization, the BIOS installs an interrupt
handler
to verify interrupts are delivered reliably. It then
generates
a few expected interrupts.If the specific interrupt is not
delivered, this error is displayed. The interrupt number,
yyyy,
represents which interrupt should have been generated.
See Code 11 for resolution information.
Fatal error: Code 14, subcode 0x0 (0)
ECC_FAILURE "Control Cache ECC
The Whack "mem test ecc" command
the
main memory to ensure ECC memory
functioning.
If this test fails, this message
other messages giving details.
Failure"
performs an ECC test over
error correction is
is displayed, together with
Note: Running the "mem test ecc" command destroys some
memory
locations in the range of [0 .. 512 KB] and [1 MB .. just
below the top of SDRAM].Hence, executing this once Linux
has booted will cause it to fail if it is reentered.
If you see this failure often during BIOS initialization,
then the cause is likely a hardware problem. Specifically,
the error tells you that the hardware ECC error mechanism
is not working correctly. Changing CPU memory DIMMs may
solve the problem, but it's more likely a board failure.
Resolution: A) Ensure the North Bridge heatsink is firmly
attached.
B) Replace CPU DIMMs.
C) Replace bootstrap CPU.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
45
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 14, subcode 0x1 (1)
Description
ECC_FAILURE "Control Cache ECC Failure"
*** Error: Missing ECC SMI [80] <= 1, data 0 0. Copy 0 Now
0 mode 0
00
10
20
30
-
0f:
1f:
2f:
3f:
00
01
04
aa
00
ff
09
aa
00
00
08
0a
00
00
09
02
00
00
20
a8
00
00
09
00
00
00
10
00
00
ff
09
00
|
|
|
|
00
ff
18
00
00
ff
09
00
00
ff
00
00
00
ff
09
c0
00
ff
00
7b
40
ff
00
df
0c
ff
59
ff
00
ff
8e
ff
This error indicates the BIOS ECC hardware test could not
get the hardware to generate an ECC SMI in response to a
corrupted memory address. It possibly indicates a failing
DIMM or memory controller, or that memory timings are too
fast for the DIMMs present in the node.
See Code 14, sub-code 0x0 for resolution information.
Fatal error: Code 15, subcode 0x0 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
mailbox register xxxx changed inappropriately
(yyyy) != expected (zzzz)
register test:
FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
During POST, all present FCAL adapters are tested for
functionality.The HBA cards sometimes require a firmware
download for full capability. POST does not have access to
this firmware and will only test basic register access and
functionality.If the Register Test fails, POST will indicate
this error.
If the user continues past this error (^C), software will
log
the error and continue testing the other PCI cards (if
present).
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
B) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
46
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 15, subcode 0x1 (slot)
Description
PCI_FIBRE_FAILURE (<slot>)"PCI Fibre Failure"
controller memory xxxx value (yyyy) != expected (zzzz)
memory test:FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
During POST, all present FCAL adapters are tested for
functionality.The HBA cards sometimes require a firmware
download for full capability. POST does not have access to
this firmware and will only test basic functionality.
If the Onboard Memory Test fails, POST will indicate
this error.
If the user continues past this error (^C), software will
log
the error and continue testing the other PCI cards (if
present).
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
B) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
47
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 15, subcode 0x2 (slot)
Description
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
data bits possibly float: Bitxxxx-Bityyyy.
PCI walking bits:FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
During POST, all present FCAL adapters are tested for
functionality.The HBA cards sometimes require a firmware
download for full capability. POST does not have access to
this firmware and will only test basic functionality.
If the PCI Fibre Card Bus Test fails, POST will indicate
this error.
If the user continues past this error (^C), software will
log
the error and continue testing the other PCI cards (if
present).
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
C) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
48
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 15, subcode 0x3 (slot)
Description
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
data bits possibly float: Bitxxxx-Bityyyy.
CM0 walking bits: FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
This test indicates a problem was observed with the fibre
channel card talking with the Cluster Manager.
If the "fibre test pci" test passed, then this problem is
likely
in the interface to the CM or CM memory.
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
C) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
49
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 15, subcode 0x4 (slot)
Description
PCI_FIBRE_FAILURE (<slot>)
PCIe EYE test: FAIL
(slot) = PCI slot number
"PCI Fibre Failure"
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
If the "fibre test cm" test passed, then this problem is
likely
in the PCIe to PCIE link between teh card and the switch.
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
C) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Fatal error: Code 15, subcode 0x10 (slot)
Fatal error: Code 15, subcode 0x11 (slot)
Fatal error: Code 15, subcode 0x13 (slot)
Fatal error: Code 15, subcode 0x14 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
BIOS can not make LSI card go into Operational state.
Resolution: A) Replace card. Send failed card back for FA.
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
HBA card register test failure
Resolution: A) Replace card. Send failed card back for FA.
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
LSI card register memory copy test failure.
Resolution: A) Replace card. Send failed card back for FA.
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
LSI card register memory copy test failure.
Resolution: A) Replace card. Send failed card back for FA.
Table Continued
50
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 15, subcode 0x15 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
Firmware rev xxxx not supported. Upgrade to yyyy
LSI card does not contain 3PAR-approved firmware. If you
need
to run with an LSI card which has an older firmware
(engineering
only), you can set the "lsi_downrev" flag in the
BIOS.Example:
Whack> set perm lsi_downrev
Resolution: A) Replace card.
upgrade.
Fatal error: Code 15, subcode 0x16 (slot)
PCI_FIBRE_FAILURE (<slot>)
Unable to get firmware rev
Send failed card back for
"PCI Fibre Failure"
Attempting to get the firmware version from the LSI card
failed.
Resolution: A) Cycle power on the node.
B) Replace card. Send failed card back for FA.
Fatal error: Code 15, subcode 0x17 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
Manufacturing test for E200 node Only.
This error occurs when the onboard LSI chips are not found.
They
are expected to be in slot 0 and 3, with two devices on
each slot.
Resolution: A) Cycle power on the node.
B) Replace motherboard.
Fatal error: Code 17, subcode 0x0 (0)
IDE_FAILURE "Internal Drive Failure"
The IDE controller failed its internal self test.
Resolution: A) Replace the IDE or SATA boot drive.
B) Replace the IDE or SATA cable.
C) Replace the node motherboard.
Diagnostic: A) Whack "ide test" commands may be used to
individually execute IDE tests.
Fatal error: Code 17, subcode 0x1 (0)
Fatal error: Code 17, subcode 0x2 (0)
IDE_FAILURE "Internal Drive Failure"
The IDE controller failed to perform a self test.
See Code 17, sub-code 0x0 for resolution information.
IDE_FAILURE "Internal Drive Failure"
IDE register xx value (yyyy) != expected (zzzz)
The IDE register test failed during a pattern test.
See Code 17, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
51
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 17, subcode 0x3 (0)
IDE_FAILURE "Internal Drive Failure"
IDE register xx value (yyyy) != expected (zzzz)
The IDE register test failed during a walking bit test.
See Code 17, sub-code 0x0 for resolution information.
Fatal error: Code 17, subcode 0x4 (0)
IDE_FAILURE "Internal Drive Failure"
There was an IDE failure in data requested by the operating
system bootstrap. It is possible that data on the disk has
become corrupt to the point the operating system will not
successfully load.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x5 (0)
IDE_FAILURE "Internal Drive Failure"
Communication with the IDE interface timed out. This error
indicates the drive is not responding to commands within an
acceptable amount of time.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x6 (0)
IDE_FAILURE "Internal Drive Failure"
IDE reported a failure in read verify command.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x7 (0)
IDE_FAILURE "Internal Drive Failure"
A timeout (10 seconds) was detected while performing DMA
operation.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x8 (0)
IDE_FAILURE "Internal Drive Failure"
An error condition was detected while performing DMA
operation.
Resolution: Replace the IDE or SATA boot drive.
Table Continued
52
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 17, subcode 0x9 (xx)
Description
IDE_FAILURE "Internal Drive Failure"
IDE power up: Unknown error
ERROR : 80
SECCNT: 80
SECNUM: 80
CYLLOW: 80
CYHIGH: 80
DEVSEL: 80
ALT_STATUS: 80
Drive: BUSY
The IDE drive had a failure at poweron reset which prevents
it from communicating with the chipset IDE controller.
Resolution: A) Cycle power on the node.
B) Reseat drive cable on both node and drive.
C) Replace the IDE or SATA boot drive.
D) Replace the node motherboard.
Diagnostic: A) Try using "ide reset" followed by "ide init"
to
clear the error.
B) The I/O address of the register which could trigger
this error at "ide init" is located at 0x1f1.
Try using "io inb 1f1" and "io outb 1f1 <value>" to
diagnose further.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
53
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 17,
sub-code 0x10 (data)
IDE_FAILURE
"Internal Drive Failure"
A disk SMART threshold was triggered.
an
imminent boot drive failure.
This would indicate
Resolution: Replace the IDE or SATA boot drive.
Diagnostic: The data value may be used to determine the
specific
SMART field which caused the alert. Examples:
0 - Unknown
1 - Raw Read Error Rate
2 - Throughput
3 - Spinup Time
4 - Start / Stop Count
5 - Reallocate Sector Count
6 - Read Channel Margin
7 - Seek Error Count
8 - Seek Time
9 - Poweron Hours
10 - Spin Retry Count
11 - Calibration Retry Count
12 - Power Cycle Count
192 - Poweroff Retract Count
193 - Load Cycle Count
194 - Temperature Celsius
195 - Hardware ECC Recovered
196 - Reallocate Event Count
197 - Current Pending Count
198 - Offline Scan UE Count
199 - UDMA CRC Error Count
200 - Write Error Count
201 - Off Track Error Count
202 - DAM Error Count
203 - Run Out Cancel
204 - Raw Read Error Count
205 - Thermal Asperity Count
207 - Spin High Current Count
208 - Spin Buzz Count
209 - Offline Seek Performance
The "ide smart status" command may be used to display
the current SMART status fields.
Fatal error: Code 17, subcode 0x11 (0)
IDE_FAILURE "Internal Drive Failure"
IDE SMART self-test failed. The drive failed to finish a
built-in self-test.
Resolution: Replace the IDE or SATA boot drive.
Table Continued
54
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 17, subcode 0x12 (0)
Description
IDE_FAILURE "Internal Drive Failure"
Drive failed to collect SMART data. The data is vital for
the drive
to determine SMART trigger.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x13 (0)
IDE_FAILURE "Internal Drive Failure"
Drive refused to accept SMART commands.
Resolution: Replace the IDE or SATA boot drive.
Diagnostic: Use "ide smart enable" to turn on SMART before
issuing
more SMART commands.
Fatal error: Code 17, subcode 0x14 (0)
IDE_FAILURE "Internal Drive Failure"
The SMART command issued to drive has incorrect syntax.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x15 (0)
IDE_FAILURE "Internal Drive Failure"
The SMART commands failed to write or read attributes.
Resolution: Replace the IDE or SATA boot drive.
Non-fatal error: Code 17,
sub-code 0x16 (0)
IDE_FAILURE "Internal Drive Failure"
No IDE device was found.
Resolution: A) Install or replace the IDE or SATA drive.
B) Replace the node motherboard.
Fatal error: Code 17, subcode 0x18 (0)
IDE_FAILURE "Internal Drive Failure"
The IDE controller failed the BIOS interrupt test, possibly
due to a bad drive.
See Code 17, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
55
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 17,
sub-code 0x19 (0)
IDE_FAILURE "Sequential DMA read timed out"
DMA xfer error code xxxx
The drive DMA test failed due to a timeout. Although each
sequential DMA read operation is succeeding, the total test
time
was exceeded. The likely cause of this failure is a drive
which
is having to perform a large number of relocations due to
failed
sectors, or a drive interface failure which only shows up
under
stress.
Resolution: Replace the IDE or SATA boot drive.
Non-fatal error: Code 17,
sub-code 0x1A (0)
IDE_FAILURE "Active Partition Set Incorrectly"
The active partition identified in the Master Boot Record
does not match the
default boot partition identified in the grub menu. This
can cause an infinute
reboot loop if the two partitions contain different BIOS
versions.
TPD will reboot expecting BIOS to update itself and BIOS
will see what it
believes is the correct version and skip the update.
This situation should not occur generally but can happen
when a previous update
was aborted and not rolled back fully or correctly.
Resolution: Correct the active partition setting or adjust
the
default boot partition in grub's menu to match. May also
need to
check the previous update or rollback to ensure all
installed code
is at the desired version.
Fatal error: Code 17, subcode 0x20 (0)
IDE_FAILURE "Internal Drive Failure"
Drive did not return status to host after a command within a
reasonable amount of time.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x21 (rpm)
IDE_FAILURE "Internal Drive Failure"
*** Error: Boot drive is not a Solid State Disk (SSD).
This error occurs when the disk drive for a harrier system
is
not a SSD disk drive type.
Resolution: A) Replace the SATA drive with a SSD drive.
Table Continued
56
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 17, subcode 0x22 (disk size)
IDE_FAILURE "Internal Drive Failure"
*** Error: Disk Size (XXX.X GB) is less than 128 GB.
This error occurs when we have 32 GB or less of cluster
memory
and the disk drive is less than 128 GB. This is because the
disk is not large enough for the memory dumps if the node
panics.
Resolution: A) Replace the SSD drive with a drive of at
least
128 GB.
Fatal error: Code 17, subcode 0x23 (disk size)
IDE_FAILURE "Internal Drive Failure"
*** Error: Disk Size (XXX.X GB) is less than 256 GB.
This error occurs when we have more than 32 GB of cluster
memory
and the disk drive is less than 256 GB. This is because the
disk is not large enough for the memory dumps if the node
panics.
Resolution: A) Replace the SSD drive with a drive of at
least
256 GB.
B) Reduce cluster memory to 32 GB or less.
Fatal error: Code 17, subcode 0x30 (0)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
Resolution: Replace the IDE or SATA boot drive.
Non-fatal error: Code 17,
sub-code 0x40 (xxxxxxxx)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
xxxxxxxx, AHCI Port Status register, for lab debug
Resolution: TODO
Non-fatal error: Code 17,
sub-code 0x41 (xxxxxxxx)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
xxxxxxxx, AHCI Port Error register, for lab debug
Resolution: TODO
Non-fatal error: Code 17,
sub-code 0x42 (xxxxxxxx)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
xxxxxxxx, AHCI Port TFD register, for lab debug
Resolution: TODO
Table Continued
Error codes—HPE 3PAR OS 3.3.1
57
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 18,
sub-code zzzz (0)
BIOS_INT_UNIMPLEMENTED"BIOS Int Unimplemented"
*** Real-mode BIOS interrupt: xxxx(error: yyyy)
This error most commonly indicates a bad or missing boot
area of
the USB disk. Customer Service node-disks or node spares
(FRUs)
might not be shipped with an operating system.Attempting to
boot
from one of these disks without first installing the system
software might produce this error message.From the Whack
prompt,
use the "boot net install" command to install the system
software.
In order for Linux to boot, LILO must load the kernel
image. It
needs assistance from the BIOS in order to perform this
task.
Linux also acquires some information from the BIOS using 16
bit
BIOS interrupts. CBIOS automatically accepts and emulates
traditional 16 bit BIOS interrupts to support these methods.
If LILO or Linux triggers an interrupt which is not
supported
by CBIOS, this possibly fatal error will result. There are
many
obsolete BIOS facilities which are not supported by CBIOS.In
some cases, the system boot may be able to continue after
this
error.
The sub-code and minor code indicate the specific BIOS
interrupt
called and the eax register parameter value. This
information
may be useful to Engineering.
Resolution: A) Reboot.Attempt to reproduce the problem.
B) Reinstall system software on the disk.
This may require a "boot net install" in
order to reinstall the operating system.
C) There may be a bug in the OS you are using or
it has been misconfigured. Confirm this version
of the OS has been verified to work on a 3PAR
node board.Or, temporarily swap system disks
with a known good system disk.
D) Replace the boot drive and reinstall the system
software.
E) Replace the node motherboard.
Diagnostic: A) Look up the displayed Real-mode BIOS
interrupt
number in a BIOS index to determine the facility
the software is requesting.This may provide
you a clue as to the cause.
Table Continued
58
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
B) Use the Arium to single step through the return
from the 16 bit handler to the originating code
to determine what code is involved with the
unimplemented BIOS operation.
Fatal error: Code 19, subcode 0x0 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
Booting from SATA IDE...
No IDE or USB drives present or boot sector is invalid.
or
Booting from SATA IDE (bootdev)...
No IDE drive present or boot sector is invalid.
or
Booting from PATA IDE...
No IDE drive present or boot sector is invalid.
or
Booting from USB...
No USB drive present or boot sector is invalid.
The IDE (PATA or SATA) or USB Flash disk is used for
booting the
operating system. This error indicates no a drive was
found during
during a hardware probe, but it was found to not be
boootable.
Resolution: A) Cycle power on the node.
B) Verify disk power and data cables are connected to
both the drive and the motherboard.The red stripe
on the IDE data cable must be oriented closest to
the power connector on the drive.
C) Replace the disk power cable and/or data cable.
D) Replace the drive.
E) Replace the node motherboard.
Diagnostic: A) Reset and enter Whack with ^W after the PCI
bus
scan but before the IDE probe. You should be
able to use the "ide init" command to probe for
a disk.Minimal output should include drive
Capacity and Geometry (C/H/S: cylinder/head/sector).
B) If the above information is available, use the
"ide read" command to read a sector into CPU memory
and verify it was read.Example:
Whack> ide read 1000 0 1
Whack> d 1000 200
You should see the contents of sector 0, which
(with a previously initialized node disk) will
include the string "LILO" starting at byte
offset 6.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
59
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 19, subcode 0x1 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
IDE TIMEOUT waiting for DRDY
The IDE disk is used for booting the operating system.
This error indicates there was a problem communicating with
the IDE controller, most likely due to a missing IDE hard
drive, a disconnected cable, or a failed IDE hard drive.
See Code 19, sub-code 0x0 for resolution information.
Fatal error: Code 19, subcode 0x2 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
IDE TIMEOUT waiting for DRQ
The IDE disk is used for booting the operating system.
This error indicates that a command was issued to the IDE
disk (read sectors) but the drive controller did not report
back with the data within a reasonable amount of time.This
may be caused by a failed sector or IDE controller failure.
See Code 19, sub-code 0x0 for resolution information.
Fatal error: Code 19, subcode 0x3 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
IDE ERROR reading sector xxxx
The IDE disk is used for booting the operating system.
This error indicates that a command was issued to the IDE
disk (read sectors) but the drive controller reported that
there was a error in reliably retrieving the requested
sectors. This error may be caused by a failed sector or
IDE controller failure.
See Code 19, sub-code 0x0 for resolution information.
Table Continued
60
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 20, subcode 0x0 (0)
Description
AP_INIT_FAILURE "AP Init Failure"
*** Error: Failed to deliver startup message to CPU xxxx
or
*** Error: Errors in APs starting up.
If a board has more than a single CPU, only one CPU comes
out of power-on executing code. The other waits in a halted
state for an AP message from the bootstrap processor. All
MP-capable Pentium processor has an onboard Advanced
Programmable Interrupt Controller called the Local APIC
(there
is a complementary component called the IOAPIC located on
the
motherboard). Once the bootstrap processor has completed
all
node board initialization and testing, it starts up each
application processor (which in Intel terms is defined as
any
processor other than the initial bootstrap processor).Each
AP then does a brief identify, verify, and microcode update.
In the above case, if the local APIC fails deliver an AP
startup to the other processor within a reasonable amount of
time, this error will result. In a single CPU system this
error should not occur because an earlier probe should
identify no AP processor is present. If the Local APIC
cannot reliably deliver a message over the IOAPIC, then it
is probably not safe to ignore this error by pressing ^C.
Resolution: A) Reseat both processors in their sockets.
B) Replace each processor individually. Do not
bother with downgrading to a single processor
system since this is a multiprocessor startup
issue. The problem processor will not be
apparent with a single processor configuration.
C) Replace the node motherboard.
Diagnostic: A) Use Arium as bootstrap processor and verify
that
APIC message is being delivered to the bus.
B) Use Arium as application processor and verify that
APIC message is delivered from the IOAPIC on
the motherboard. The application processor should
then start executing code at the default APIC
address of 0x30000 (FIRST_SMM_BASE).
Table Continued
Error codes—HPE 3PAR OS 3.3.1
61
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 20, subcode 0x1 (0)
Description
AP_INIT_FAILURE "AP Init Failure"
*** Error: Startup message successfully sent to CPU xxx, no
response
After an AP startup message has been delivered to the
application processor through the IOAPIC, the bootstrap
processor waits for an indication the AP has started.
If the indication is not received before a reasonable
timeout, this error is given. It should be ok to ignore
this message by pressing ^C and continue with further
BIOS diagnostics.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 20, subcode 0x2 (0)
AP_INIT_FAILURE "AP Init Failure"
*** Error: CPU xxxx failed to complete initialization.
Once the application processor (AP) has started
initialization,
it sets a flag that the bootstrap processor can use to
determine
when the bootstrap processor has completed. If the AP
remains
in the AP_INIT_START state too long, this fatal error is
displayed.It is probably not safe to resume after this error
since the AP may be off executing errant code or interfering
with bootstrap processor bus cycles.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 20, subcode 0x3 (0)
AP_INIT_FAILURE "AP Init Failure"
*** Error: POST failure on CPU xxxx: yyyy
*** Error: CPU xxxx initialization failure.
The application processor (AP) previously failed to complete
a Built In Self Test (BIST). This is likely due to a bad
processor.
Resolution: A) Replace the application processor.
Table Continued
62
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 20, subcode 0x4 (0)
Description
AP_INIT_FAILURE "AP Init Failure"
*** Error: Invalid CPU for CPU xxxx, error code: yyyy
*** Error: CPU xxxx initialization failure.
During application processor (AP) initialization, it
verifies
that the CPU model, stepping, and clock multiplier which is
being initialized matches those values of the bootstrap
processor.If they do not match, this error will result.
Resolution: A) Since the processors are possibly mismatched,
remove the heatsink on both and verify that
the CPU model and stepping are identical.
See Code 20, sub-code 0x0 for more resolution information.
Fatal error: Code 20, subcode 0x5 (0)
AP_INIT_FAILURE "AP Init Failure"
*** Error: More than wwww CPUs in system.
*** Error: CPU xxxx initialization failure.
The currently supported node board hardware configuration
is a maximum of two physical processors. The BIOS uses this
knowledge to limit the possibility of repeat initialization
of the application processor (AP).If this message occurs,
it may be due to a variety of hardware problems, but most
suspect is the application processor.
See Code 20, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
63
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 21, subcode 0x0 (0)
Description
SMI_SETUP_ERROR "SMI Setup Failure"
*** SMI setup error: Not expecting to install a vector on
CPU xxxx
Intel processors support an interrupt level called SMI
(System
Management Interrupt) which is used for hardware management
(usually by the BIOS).Events such as power management and
hardware errors usually trigger an SMI. When an SMI is
triggered, the system enters SMM (system management mode).In
a multiprocessor system, both processors are usually
triggered
by an SMI at the same time. Since both processors may
attempt
to service an SMI at the same time, each processor must
have a
unique stack area where to dump processor context.SMI setup
configures each processor individually with a unique stack
address for SMI handling.
This particular error indicates that the SMI setup handler
has detected a stack setup SMI, yet one was not expected
(because one had already been set up or CPU initialization
had not yet reached the point of SMI setup). The bootstrap
CPU delivers the setup SMI to itself and to the application
processor.This error could be caused by a faulty CPU or
motherboard. The CPU which reports the setup error may not
be the one at fault.
Resolution: A) Pull one processor at a time to determine if
the problem is reproducible with a single CPU.
B) Swap CPUs to see if the exact problem moves with
CPU. If not, it may be the motherboard.
C) Individually replace both CPUs.
D) Replace the node motherboard.
Diagnostic: A) Use Arium as bootstrap processor and verify
that
the SMI is being delivered.
Fatal error: Code 21, subcode 0x1 (0)
SMI_SETUP_ERROR "SMI Setup Failure"
*** SMI setup error: CPU xxxx not found in CPU table
During SMI setup, each processor in turn receives an SMI and
then performs stack initialization. Prior to the SMI setup,
all application processors wait in a halted state for an
APIC
message to identify and download microcode. If the processor
performing an SMI setup detects that it had not previously
executed and added its CPU ID to the system table, then this
fatal error will be displayed.
See Code 20, sub-code 0x1 for resolution information.
Table Continued
64
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 21, subcode 0x2 (0)
SMI_SETUP_ERROR "SMI Setup Failure"
*** SMI setup error: CPU xxxx did not respond
During SMI setup, each processor in turn receives an SMI and
then performs stack initialization. This error indicates
that the bootstrap processor issued an SMI through the APIC
and it was not processed by the targeted processor. This
indicates that either SMIs are not being delivered properly,
or that the targeted processor may be defective.
See Code 20, sub-code 0x1 for resolution information.
Fatal error: Code 22, subcode 0x0 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `cbios_to_os_message' test, expected xx but
got yy
CBIOS provides service to the 3PAR kernel through a special
command queue.Responses are returned to the OS through
another queue, which is tested during BIOS initialization.
Sub-code 0x0 indicates that the CBIOS to OS queue did not
pass the built-in test.
Resolution: A) Pull one processor at a time to determine if
the problem is reproducible with a single CPU.
B) Swap SDRAM with good SDRAM.
C) Update CBIOS to the latest version.
D) Replace the node motherboard.
Fatal error: Code 22, subcode 0x1 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `os_read_message_test', failed to read message
This error indicates that the CBIOS to OS queue test failed
to acquire a message it previously sent.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 22, subcode 0x2 (0)
CBIOS_OS_QUEUE_ERROR" CBIOS OS Queue Failure"
*** Error: In `os_read_message_test':
expected: uuuu vv `ww' but got: xxxx yy `zz'
This error indicates that the CBIOS to OS queue test failed
because the message received did not match the message sent.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 22, subcode 0x3 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `os_read_message_test', expected no more data
This error indicates that the CBIOS to OS queue test failed
because there were more items in the queue than those sent.
See Code 20, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
65
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 22, subcode 0x4 (0)
Description
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: Couldn't send simulated message from OS to CBIOS,
code == xx
This error indicates that the OS to CBIOS queue test failed.
The minor code will indicate to an engineer what went wrong.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 22, subcode 0x5 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: Inconsistent queue: queue_base == ww,
queue_limit == xx
queue_inp = yy, queue_otp = zz
This error indicates that the CBIOS to OS queue test failed
because the queue pointers became corrupt.
See Code 20, sub-code 0x0 for resolution information.
Non-fatal error: Code 23,
sub-code 0x0 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
CRC mismatch for failsafe CBIOS
Upon startup, CBIOS computes a strong CRC over all
executable
code and data stored in the flash.This is done to guard
against
flash corruption which also ensures reliable system
initialization
and testing. This specific sub-code indicates that a CRC
error
was detected in the failsafe component of CBIOS. The
majority of
the failsafe is only executed if corruption is detected in
the
main CBIOS.
Resolution: A) Try pressing ^C to resume. Perform a flash
update as soon as possible.If flash updating
under Linux, make sure to specify the 'failsafe'
option to update the failsafe area as well.
B) If the flash update is successful, but you
still get a CRC error, verify that your flash
image is intact. The Linux flash utility does
this automatically using the same strong CRC
algorithm as the BIOS uses.
C) Replace the node motherboard.
Diagnostic: A) Use the Whack "net tftp" command to download
an identical image to that which is in flash.
Use the Whack "mem compare" command to
locate bytes which differ so that you may
examine those values with "d <addr>"
B) If Whack is not available, use the Arium to
look at flash address space for defects. It
may be a stuck, floating, or bridged address
or data line.
C) Replace the flash part.
Table Continued
66
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 23,
sub-code 0x1 (0)
Description
FLASH_CRC_ERROR "Flash CRC Failure"
Invalid entry point for full CBIOS
Boot with clustering disabled and update flash immediately!
Prior to starting up the non-failsafe (full diagnostic)
CBIOS
image, the failsafe CBIOS performs some consistency checks
over the image. This error indicates corruption was detected
in the entry point to the main routine of the full CBIOS.
If you are have recently installed a new CBIOS which is
larger
than the previous, it is possible to get this error because
the failsafe BIOS present cannot properly verify the larger
size BIOS.
Resolution: A) Try pressing ^C to resume. Perform a flash
update as soon as possible.Boot with clustering
disabled by typing "tpd nokmod" at the LILO prompt.
Once the node has booted, login as root and use
the flash command. Example:
# flash /opt/tpd/bios/bios-1.9.4
Upon completion of the flash update, reboot and
observe console messages to ensure the CRC error
no longer occurs.
B) If the flash update is successful, but you
still get this error, verify that your flash
image is intact. The Linux flash utility does
this automatically using the same strong CRC
algorithm as the BIOS uses.
C) Replace the node motherboard.
Diagnostic: A) If Whack is not available, use the Arium to
look at flash address space for defects. It
may be a stuck, floating, or bridged address
or data line.
B) Replace the flash part.
Fatal error: Code 23, subcode 0x2 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
Invalid magic for full CBIOS
Prior to starting up the non-failsafe (full diagnostic)
CBIOS
image, the failsafe CBIOS performs some consistency checks
over the image. This error indicates the failsafe BIOS could
not find a proper header record for the full CBIOS.
See Code 23, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
67
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 23, subcode 0x3 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
CRC mismatch for full CBIOS
Prior to starting up the non-failsafe (full diagnostic)
CBIOS
image, the failsafe CBIOS performs a strong CRC over the
full
CBIOS image to verify the image's integrity. This error
indicates the full CBIOS had a CRC failure.
See Code 23, sub-code 0x1 for resolution information.
Fatal error: Code 23, subcode 0x4 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
Failsafe CBIOS is now enabling the full CBIOS
...
The full CBIOS either detected an error or user input (the
'f' key) which forced it to return to the failsafe BIOS.
If the user did press the 'f' key, then press ^C to resume
startup under the failsafe BIOS. If the user did not press
the 'f' key, browse prior messages to learn of a failure
which may have caused this error.
Resolution: A) If the error was not the result of a
keystroke,
try pressing the 'n' key at BIOS startup to
clear any initialization skips.It may be
recorded in NVRAM to skip the full BIOS version
and always execute the failsafe.
See Code 23, sub-code 0x1 for more resolution information.
Non-fatal error: Code 23,
sub-code 0x10 (flags)
FLASH_CRC_ERROR "EOS: Repairing Main BIOS"
The EOS Main BIOS image in SPI has failed to boot and the
FPGA
watchdog has reset the node to boot from the failsafe BIOS.
The
failsafe BIOS has detected a bad CRC in the main BIOS
region of
flash and is attempting to automatically re-flash that
region
from disk.
The data field contains flags indicating what errors were
seen
during the verification check:
Bit00 -> Descriptor Region CRC Failed.
Bit01 -> Main BIOS Region CRC Failed.
Bit24 -> Test Injection Descriptor CRC Error
(munge_desc_crc).
Bit25 -> Test Injection Main BIOS CRC Error
(munge_bios_crc).
Other bits are undefined.
Table Continued
68
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 23, subcode 0x11 (bbxxyyzz)
Description
FLASH_CRC_ERROR "EOS: Main BIOS Corrupt"
The EOS Main BIOS image in SPI has failed to boot and the
FPGA
watchdog has reset the node to boot from the failsafe BIOS.
The
failsafe BIOS has detected a bad CRC in the main BIOS
region of
flash.The failsafe BIOS has also detected five or more
attempts
to automatically recover the Main BIOS within the past two
hours
and has stopped attempting automatic recovery.
The data field contains the build (bb) and version
(xx.yy.zz) of
the Main BIOS that failed to boot.
Fatal error: Code 23, subcode 0x12 (0)
FLASH_CRC_ERROR "BIOS Signature Verification Failed"
Starting in Manchester (v3.2.2) the BIOS image for
Tornado, Chimera
and Orion platforms is signed and the encrypted signature is
included in the "uefi_spi_....signed.bin" file. This fatal
error
code is logged when the encrypted signature does not match
the file
hash calculated immediately before a flash update is
performed. The
flash update is not performed in this instance. If an
unsigned but
otherwise valid image is specified for an update, this code
will
also be logged. This code is not logged for a binary file
that is
not a valid format.
Retry the flash update operation with a correctly signed and
uncorrupted binary file. If the BIOS image installed with
TPD
becomes corrupted, then a TPD update may be required to fix
the
corrupted BIOS package or BIOS package contents.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
69
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 24, subcode 0x0 (ptr)
TURD_EXCEEDED_LIMIT "TURD Exceeded Limit"
*** Error: MP turd exceeded 0x100000
The BIOS presents to the operating system a set of tables
which
describe the hardware present in the system. These tables
have
a rigid structure for each type of device.If the CBIOS
configuration structure becomes corrupt, this error may
result
when the TURD structures are initialized for the operating
system. A consistency check ensures the TURD area does not
go beyond 1 MB (which is the base address where the
operating
system normally begins using main memory).The data to this
error is the pointer address reached, and will be greater
than 0x100000.ptr is the value which exceeded 0x100000.
Resolution: A) Remove cards from all PCI slots. If the
error no longer occurs, it may be a
hardware failure on one of cards.
B) Replace the node motherboard.
Diagnostic: A) Look at memory starting at 0x000f0000.
0x5f504d5f is the magic number of the first
first TURD (the MP Configuration table).
B) Turn on PRINTING_TURD and DEBUG_APIC compile flags.
Fatal error: Code 24, subcode 0x1 (0)
TURD_EXCEEDED_LIMIT "TURD Checksum Failure"
*** Error: MP table checksum failed - stopping table build
The BIOS presents to the operating system a set of tables
which
describe the hardware present in the system. In this case,
the
BIOS detected that one of the tables had a bad checksum.
Resolution: A) Remove cards from all PCI slots. If the
error no longer occurs, it may be a
hardware failure on one of cards.
B) Replace the node motherboard.
Table Continued
70
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 24, subcode 0x2 (0)
TURD_EXCEEDED_LIMIT "TURD Exceeded Limit"
*** Error: Too many MP table entries - stopping table build
The BIOS presents to the operating system a set of tables
which
describe the hardware present in the system. In this case,
the
BIOS detected that it had added too many entries to the
table,
likely because too many PCI devices are present in the
system.
This error is likely due to an earlier PCI failure.
Resolution: A) Remove cards from all PCI slots. If the
error no longer occurs, it may be a
hardware failure on one of cards.
B) Replace the node motherboard.
Fatal error: Code 25, subcode 0x0 (0)
PROM_FAILURE"PROM Failure"
The node board has two different Serial EEPROM devices used
for storing persistent board information. One PROM device
is located on the I2C bus.It stores node board
manufacturing,
assembly, serial number, and error message log information.
The second PROM device is connected through the Intel
82559ER
ethernet controller. It stores ethernet controller
information
such as initialization state and the hardware MAC address.
PROM checksum: FAIL
The PROM which stores node board manufacturing, assembly,
serial number, and error message log information does
not have a valid checksum.If the PROM has not yet been
initialized or if it has become corrupt, you may see this
error.
Resolution: A) Press ^W to enter Whack and use either
"prom init" or "prom edit" to correct this
error.
B) If the information looks correct with
"prom id" then try using "prom checksum" to
rewrite the checksum.
C) Replace the node motherboard.
Diagnostic: A) Use the Whack "d prom <addr>" command to
display PROM contents. Use the Whack
"c prom <addr>" command to change PROM
contents. Look for a pattern in order to
determine if the error is due to the device's
connection with the motherboard or a hardware
failure within the Serial PROM.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
71
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 25, subcode 0x1 (0)
PROM_FAILURE"PROM Failure"
Ethernet 0 PROM checksum: FAIL
The PROM which stores ethernet controller information does
not have a valid checksum.If the PROM has not yet been
initialized or if it has become corrupt, you may see this
error.
Resolution: A) Press ^W to enter Whack and use "prom id" to
verify the other PROM is valid.If not, first
use "prom init" or "prom edit" to set the
PROM information. If the PROM information appears
valid, use "prom mac" to reprogram the
Ethernet MAC address and checksum.
B) Try flushing out a correct checksum.
Note: You must first select the device with an
error using the "eth dev" command.
Example:
Whack> eth dev 1
Whack> eth checksum
C) Replace the node motherboard.
Diagnostic: A) Try programming a custom MAC address.
Example: "prom mac 00:02:AC:00:00:43"
B) Use the Whack "d eth <addr>" command to
display PROM contents. Use the Whack
"c eth <addr>" command to change PROM
contents. Look for a pattern in order to
determine if the error is due to the device's
connection with the motherboard or a hardware
failure within the Ethernet PROM.
Table Continued
72
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 25,
sub-code 0x2 (0)
Description
PROM_FAILURE"PROM Failure"
Ethernet MAC xx:xx:xx:xx:xx:xx mismatches PROM:
yy:yy:yy:yy:yy:yy
The "prom mac" command may fix this.
This error indicates the MAC address stored in the onboard
Ethernet controller's PROM does not match that which can be
computed from the board revision and serial number stored
in the node's PROM. This mismatch suggests that one or the
other PROM may contain corrupt contents.
If the ethernet MAC address was purposely set to an address
(see "prom mac" command), then this check may be overridden
by setting the NVRAM "oddmac" flag. Example:
Whack> set perm oddmac
Resolution: A) Look for a prior message indicating an
invalid
board type or check the banner to ensure the board
type and serial number are correct for this node.
If either is not correct, use the 'prom edit'
command to repair the corruption.
B) Use the "prom mac" command to reprogram the
MAC address in the ethernet controller's PROM.
C) Replace the node motherboard.
Diagnostic: A) Determine if the cause is due to a failing
node
PROM or ethernet controller PROM. Use the
"db prom 0 20" command to display PROM contents
and compare with expected values. Example:
Whack> dbz8 prom 0 20
prom 0000: 00 04 09 20 10 03 04 35 . ...5
prom 0008: 30 53 4f 4c 01 10 00 00 0SOL..
prom 0010: 00 76 ff ff ff ff ff ff v......
prom 0018: ff ff ff ff c1 1f a4 5e .......^
Replace node PROM if it is defective.
B) Use the "db eth 0 20" command to display ethernet
PROM contents and compare with expected values.
Example:
Whack> dbz8 eth 0 20
eth 0000: 00 02 ac 14 00 76 03 01 ... v..
eth 0008: ff ff 01 00 01 07 00 00 ... ..
eth 0010: 10 00 04 03 40 48 00 00 . ..@H
eth 0018: 86 80 00 00 ff ff ff ff .. ....
Replace ethernet PROM if it is defective.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
73
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 25,
sub-code 0x3 (0)
Description
PROM_FAILURE "PROM Failure"
During initialization, CBIOS checks the prom for magic
number.
If the magic number test fails. This non-fatal logs when the
magic number check fails.
Resolution: A) On EOS platforms, the midplane, node type,
slot
id may need to be reconstructed with prom edit.
The Ethernet MAC and PROM magic number may also
need to be reconstructed. (Bug 82094)
B) Previous platforms should be reinitialized and
reconstructed automatically
Diagnostic: A) Use "db i2c 2.a6.0 100" to view the contents
of this region.Typically only the first 32
bytes are affected.
Non-fatal error: Code 25,
sub-code 0x4 (aabbccdd)
PROM_FAILURE"PROM Failure"
Board Spin value is invalid.
fix this.
The "prom edit" command may
This error indicates the board spin value in the prom
record is
not in the proper range. The range of the board spin byte
is
0x01 to 0x16. If the board spin number is out of this
range, then
this error will occur.
NOTE: On Tinman, the board spin field is not used as board
spin,
so this field will always be 0x17.On Tinman, this is NOT
flagged
as a error.
If the board spin field is not valid, then the BIOS used the
board revision field. This is a two character field that
must
be "01" to "09", then "A0" to "A9", then "B0" to "B9" etc.
If a character (A-Z) is in the secord byte or a non zero
number (1-9) is the first character, then this is an error.
In the data field, aa is the board spin value, bb is the
calculated
board revision, cc is the first character in the rev field,
and
dd is the second character in the rev field.
Resolution: A) Use "prom edit" to fix/verify the board spin
field.
B) Use "prom edit" to fix the board revision field.
C) Replace the node motherboard.
Table Continued
74
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 25,
sub-code 0x5 (0)
Description
PROM_FAILURE"PROM Failure"
EOS node prom value is invalid. The "prom edit" command may
fix this.
This error indicates EOS node prom value in the prom record
is
not in the proper range. The node type and midplane type
value
in prom should be programmed correctly with prom edit
command.
Resolution: A) Use "prom edit" to fix/verify the midplane
field.
B) Use "prom edit" to fix/verify the node type field.
Non-fatal error: Code 25,
sub-code 0x6 (0)
PROM_FAILURE"PROM Failure"
EOS Node ID in Prom and Slot ID do not match.
edit" command may fix this.
The "prom
This error indicates EOS Node ID prom value in the prom
record does
not match the Slot ID read from the fpga. The Node ID value
in prom should be programmed correctly with prom edit
command.
Resolution: A) Use "prom edit" to fix/verify the Node ID
field.
Fatal error: Code 25, subcode 0x7 (devices)
PROM_FAILURE"PROM Failure"
CBIOS was unable to read the ethernet
controllers PROM likely indicating a HW failure, a bad PROM
or board level issue.
"devices" is a mask that indicates what eth devices
encountered the HW error. So it is possible, though
unlikely, multiple controllers
will have been tested and failed at the same time.
Resolution:
A) Cycle power on the node.
B) Replace the node motherboard if the failed ethernet
device is on the node.
Diagnostic:
A) Rerun the test or confirm with the dump command all
reads fail: dump eth0:10
Table Continued
Error codes—HPE 3PAR OS 3.3.1
75
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 25, subcode 0x8 (0)
Description
PROM_FAILURE"PROM Failure"
Chimera node prom value is invalid.
command may fix this.
The "prom edit"
This error indicates Chimera node prom value in the prom
record is
not in the proper range. The midplane type value
in prom should be programmed correctly with prom edit
command.
Resolution:
A) Use "prom edit" to fix/verify the midplane field.
Non-fatal error: Code 26,
sub-code 0x1 (ethdev)
ETH_FAILURE "Ethernet Failure"
eth0 device self test:
FAIL All tests: xxxx (timeout)
During initialization, CBIOS has the ethernet controller
perform
an internal test to verify correct operation. If the
ethernet
controller does not respond within a reasonable amount of
time,
this error is displayed.
"ethdev" indicates the PCI Slot in
device is located.This is an ASCII
PCI slot 0. If the ethernet device
motherboard, then ethdev will have
which the failed ethernet
value, so 0x30 indicates
is located on the node
a value of 0x00.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Verify the 82559ER shows up in a PCI scan.
Use the Whack "pci find 8086" command. It
should display the 82559ER Ethernet controller.
B) Use the Whack "eth test" command to repeat
the test. Make sure that CBIOS initialization
has past the point of PCI scan.Use Whack
"loop ffff eth test" to repeat in a loop.
Table Continued
76
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 26,
sub-code 0x2 (ethdev)
ETH_FAILURE "Ethernet Failure"
eth0 device self test:
FAILxxxx yyyy
If the ethernet controller fails its internal test, this
error will be displayed. Since this is an internal test,
it is likely the ethernet controller itself which has
failed.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "eth test" command to repeat
the test. Make sure that CBIOS initialization
has past the point of PCI scan.Use Whack
"loop ffff eth test" to repeat in a loop.
Non-fatal error: Code 26,
sub-code 0x3 (0)
ETH_FAILURE "Ethernet Failure"
No ethernet devices available for loopback test
This error indicates that no ethernet devices could be found
or initialized on the node. This is possibly the result of a
hardware failure.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "eth test" command to make sure
that the low level test passes.
B) Try using "net dhcp" in an environment that has
a DHCP server to see if the node can send and
receive packets. If so, then this error is
likely caused by incorrect BIOS code.
Non-fatal error: Code 26,
sub-code 0x4 (0)
ETH_FAILURE "Ethernet Failure"
No loopback connections were found. An external loopback
plug is
required if this node has only one ethernet port. A crossover
cable is required if this node has more than a single
ethernet port.
Resolution: A) Make sure the ethernet loopback plug is in
the ethernet connector (you should see link
status lights illuminated).In the case of
a node having two ethernet ports, make sure
a crossover cable is connected between the
ethernet ports.
B) Cycle power on the node.
C) Replace the node motherboard.
Diagnostic: A) This problem is most likely caused by a bad
connector or bad connection to the loopback
plug. Make sure TX+ makes a circuit to RX+
and TX- makes a circuit to RX- on the PHY.
B) Try plugging into a normal ethernet to see
if it can talk to a DHCP server "net dhcp"
C) Try using "net loopback" to test the ethernet
port using the internal PHY loopback.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
77
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 26,
sub-code 0x5 (slotid)
ETH_FAILURE "Ethernet Failure"
eth2 loopback PHY internal:
FAIL
This error indicates that the internal loopback of the PHY
did
not correctly loop back packets. If the device being tested
is onboard the node (82559ER or 82551ER), then this is a
failure.
Some plug-in PCI boards (such as 82557) do not fully support
PHY loopback. Those devices will cause the following
warning:
eth2 loopback PHY internal:
Unavailable
No error stop will occur in the case of a PHY not supporting
internal loopback.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the "pci probe" command to match up the
ethernet devices with which one has failed.
B) Try using an external loopback to see if the
results are the same. If the same, then try
debugging using a scope. If the external
loopback works, then it may be that the PHY
loopback just does not work in this device.
Non-fatal error: Code 26,
sub-code 0x6 (slotid)
ETH_FAILURE "Ethernet Failure"
eth0 sends to eth1 but cannot receive from it
This is an unusual error in that one ethernet device is able
to reliably receive packets from the other, but the opposite
is not true.
Resolution: A) Run the test again.If the nodes are attached
to a hub, the failure may be due to another
ethernet node flooding the network.
B) Cycle power on the node.
C) Ensure that there is no a switch between
the ethernet ports.A switch may prevent the
test from functioning properly if the MAC
address of an interface is in use elsewhere
or the switch is really an IP router.
D) Ensure that there is no a switch between
the ethernet ports.A switch may prevent the
test from functioning properly if the MAC
address of an interface is in use elsewhere
or the switch is really an IP router.
Diagnostic: A) Test against a plug-in PCI ethernet card to
isolate which ethernet interface is not
functioning.
Table Continued
78
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 26,
sub-code 0x7 (slotid)
ETH_FAILURE "Ethernet Failure"
eth0 loopback wwwww: FAIL - receive timeout (xx seconds)
This error indicates the ethernet device did not
successfully
receive the loopback pattern sent to test the ethernet
device's
tranceiver. The failure to receive a loopback pattern
usually
means the ethernet device has failed.
"ethdev" indicates the PCI Slot in
device is located.This is an ASCII
PCI slot 0. If the ethernet device
motherboard, then ethdev will have
The following
to
see. If this
not
happened:
eth0 loopback
eth0 loopback
eth0 loopback
eth0 loopback
eth0 loopback
which the failed ethernet
value, so 0x30 indicates
is located on the node
a value of 0x00.
are normal test results that you would expect
error occurs, then one of the following has
All zeros:
PASS
All ones:
PASS
Walking ones:
PASS
Walking zeros:
PASS
Random pattern: PASS
This error indicates that within 100 packets successfully
transmitted, there were no packets successfully received.
Resolution: A) Cycle power on the node.
B) Unplug the network cable and run the test again.
If the node is attached to a hub, the failure
may be due to another ethernet node flooding the
network. This is not very likely.
C) If the ethernet device is located in a PCI
slot, replace the card.
D) Replace the node motherboard
Diagnostic: A) Test against a plug-in PCI ethernet card to
isolate which ethernet interface is not
functioning.
Non-fatal error: Code 26,
sub-code 0x8 (slotid)
ETH_FAILURE "Ethernet Failure"
eth0 loopback wwwww: Packet transmit failed
This error indicates that the ethernet device was not able
to
successfully transmit packets.This is really a serious
failure,
since the ethernet code will under any condition not fail to
transmit unless the ethernet device failed to initialize.
Resolution: A) Use "eth reset" to reset the ethernet device.
B) Cycle power on the node.
C) Replace the node motherboard if the failed
ethernet device is on the node.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
79
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 26,
sub-code 0x9 (slotid)
Description
ETH_FAILURE "Ethernet Failure"
eth0 loopback wwwww: FAIL - miscompare
stuck high=xxxx stuck low=yyyy toggle=zzzz
This error is displayed if one of the ethernet tests detects
a mismatch between the packet send and the data received.
It also includes a diagnostic line which is useful to see in
what way the data is different.
Resolution: A) Use "eth reset" to reset the ethernet device.
B) Cycle power on the node.
C) Replace the node motherboard if the failed
ethernet device is on the node.
Diagnostic: A) You can get complete packet dumps if you wish
to manually compare how the data was corrupted.
In order to do this, use "net loopback vv"
(double verbose).
B) If it is a single bit that is failing (or a
small number), observe if the bits are pulled
high or low. This may assist you in debugging
where the hardware is failing, if it is
external to the ethernet IC.
Non-fatal error: Code 26,
sub-code 0xa (slotid)
ETH_FAILURE "Ethernet Failure"
ethxxx device registers:FAIL
Onboard ethernet device did not read valid config from
EEPROM.
A powercycle might clear this failure if this is a new node.
This error indicates the ethernet device failed to
initialize
properly, probably because it read invalid content from the
attached EEPROM device. If this an onboard GigE on the 5000P
chipset (Tx00, Fx00, Vx00, Gx00), then it is likely this is
the first time the node has ever been powered on. Once the
BIOS
writes a configuration to the SPI EEPROM attached to the
GigE,
it is necessary for the board to be power cycled before the
GigE
device is usable. If the board is not new and you see this
failure,
then it's likely a component on the node motherboard has
failed.
Resolution: A) Power cycle the node.
B) Replace the node motherboard if the failed
ethernet device is on the node.
Table Continued
80
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 27,
sub-code 0x0 (#)
Description
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
Each node board has multiple temperature and voltage sensors
and fan RPM sensors which monitor the environment to ensure
the temperature, voltage, and fan RPM are within operating
tolerances. This directly results in increased reliability
of the product.
If a temperature or a voltage falls outside a programmed
tolerance level, CBIOS will alert the user to this
condition.
The sub-code displayed reflects the type of (the first)
error
detected. The data value is a count of the number of
temperature/voltage/fan problems detected.
A sub-code value of 0x0 indicates a fan RPM problem.
A sub-code value of 0x1 indicates a temperature problem.
A sub-code value of 0x2 indicates a voltage problem.
This particular sub-code indicates a programmed temperature
limit has been exceeded.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Verify the limit settings are reasonable. Use
the Whack "i2c env" command. The Whack
"i2c env defaults" command resets all defaults.
C) Verify both power supply fans are spinning
freely and that the supply amber failure light
is not illuminated.If only a single supply is
installed, make sure the second slot either has
a fan or is covered.
D) Replace the power supply.
E) If it's CPU temperature, verify the heatsink
is conducting heat well.
F) If it's CPU voltage, try swapping out the CPU
voltage regulators.
G) Replace the node motherboard.
Diagnostic: A) Use a voltage probe at appropriate vias to
verify correct voltage levels.
B) Verify LM87 external temperature sensor line is
well connected to the CPU's thermal diode.
Non-fatal error: Code 27,
sub-code 0x1 (#)
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates a programmed temperature limit
has been exceeded.
See Code 27, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
81
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 27,
sub-code 0x2 (#)
Description
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates a programmed voltage limit has been
exceeded.
See Code 27, sub-code 0x0 for resolution information.
Non-fatal error: Code 27,
sub-code 0x3 (0)
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates a sensor interrupt test failed.
See Code 27, sub-code 0x0 for resolution information.
Fatal error: Code 27, subcode 0x4 (0)
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates that a CPU has asserted its
THERMTRIP_N
signal. This could mean that it has reached its case
temperature,
that a VRM has failed, or there is a problem with the FPGA.
Resolution: A) Check the environmentals.
B) Replace the node.
Non-fatal error: Code 27,
sub-code 0x5 (Shutdown
Code =1 or =2)
TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Pause"
For ShutdownCode = 1:
In a system wide over temperature condition, the OS will
shut down
the system and reboot the nodes. The BIOS will pause the
boot in a
low power state until the over temperature condition has
been
cleared for 30 minutes. When in this state BIOS samples
critical
temperature sensors periodically and displays the current
state of
those on the system console every few minutes.This delay
can be
cleared early by a node power cycle.
This log entry indicates the start of the BIOS boot pause.
For ShutdownCode = 2:
This shutdown code indicates an overtemperature faulure of
a single
node. TPD will flag this failure and shutdown that node.
The node
will not complete the boot until the unit has been repaired
and any
issues cleared.
To clear the boot halt, reboot the node and use the Whack
command
"unset tshutdown" before the POST reaches step 35.
Table Continued
82
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 27,
sub-code 0x6 (0)
Non-fatal error: Code 27,
sub-code 0x7 (0)
Non-fatal error: Code 27,
sub-code 0x8 (count)
Non-fatal error: Code 27,
sub-code 0x9 (index)
Description
TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Resume"
This sub-code indicates that the critical temperature
sensors have
been below their thresholds for at least 30 minutes and the
BIOS is
resuming the boot process. See sub-code 5 for more
temperature
shutdown information.
TEMP_VOLTAGE_FAILURE"Temp Shutdown Override"
This sub-code indicates that BIOS skipping a critical
temperature
boot pause due to a node power cycle. See sub-code 5 for
more
temperature shutdown information.
TEMP_VOLTAGE_FAILURE"No Response"
This sub-code indicates that the I2C sensor defined failed
to
respond on the I2C bus. 'Count' indicates the number of I2C
device failures.
TEMP_VOLTAGE_FAILURE"High Limit Error"
This sub-code indicates that BIOS detected a mathmetical
overflow
of the 8-bit upper limit register and measurements on this
sensor
indicated by 'index' may be incorrect.The voltage or
temperature
limit could not be converted to and stored as an 8-bit
value.
Contact Engineering for a HW fix.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
83
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x0 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Cluster Memory ASIC not found for CM Init.
The Eagle/Osprey/Harrier ASICs are the Cluster Managers
which are
used for high speed communication between nodes of a
cluster.
These device are critical for the correct operation of the
node
software, and hence for operation of the whole cluster. The
CM
exists on all PCI buses in the node. If the CM cannot be
found on
any of the require PCI bus, this is a serious problem.subcode
0x0 indicates the PCI bus scan did not locate the Cluster
Manager.
Resolution: A) Cycle power on the node.
B) Pull all PCI cards and cycle power on the node.
C) Replace the node motherboard.
Diagnostic: A) Use "pci find 1590" at the Whack prompt to
see if the CM can be located. Since the same
data structure is used, it should not show
up there either. Use "pci init" which will
scan the PCI bus again.If the CM appears
now (with "pci find 1590"), it may be a
transient problem.
B) Examine the output of "pci probe" to determine
if other onboard PCI devices are missing. This
may help to determine where the failure occurs.
For example, if the four PCI bridges do not
show, it may be the CIOB at fault.
Table Continued
84
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x0 (1)
Description
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMMs did not compare identical DIMM and SPD comparison
Failed
Not all required CMA DIMMS were found or are exact matches.
Or, Example of slot 0 DIMM missing in Set2:
All slot 0 DIMMs are required installed
Set2 DC DIMM1.0.0 (J14007): not present or invalid SPD
Not all required CMA DIMMS were found or are exact matches
or
Example of 1 DIMM missing in 2 MC, 1 DIMM/MC configuration:
Number of DIMMs found 1 != 2
Example of 1 DIMM missing in 2 MC, 2 DIMM/MC configuration:
Number of DIMMs found 3 = 2
Example of 1 DIMM missing in 4 MC, 2 DIMM/MC configuration:
Number of DIMMs found 7 != 4 or 8
Not all required CMA DIMMS were found or are exact matches
The Harrier2 ASICs are the Cluster Managers which are
used for high speed communication between nodes of a
cluster.
These device are critical for the correct operation of the
node
software, and hence for operation of the whole cluster.
The compare SPD routine takes two DIMMs on the same memory
channel
(if populated as such) and checks that the first 38 bytes
of the
SPD are identical, as well as matches some specific DIMM
requirements.
sub-code 0x1 indicates that the DIMMs on one of the Memory
Channels
did not match the first 38 bytes of the SPDs, one or more
slot 0
Memory Channel DIMMs where not or the number of DIMMs found
was incorrect.
Resolution: A) Install the appropriate DIMMs in the CM
DIMM slots of each
memory channel. Not all DIMMs have to be the same vendor,
but
they must be the same vendor on each memory channel.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
85
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x0 (2)
Description
CM_MEMORY_FAILURE "Cluster Memory Failure"
Error: Harrier2 #0 not found for CM Init.
The Harrier2 ASICs are the Cluster Managers
which are used for high speed communication between nodes
of a
cluster. These device are critical for the correct
operation of
the node software, and hence for operation of the whole
cluster.
The CM exists on all PCI buses in the node. If the CM
cannot be
found on any of the require PCI bus, this is a serious
problem.
sub-code 0x2 indicates the PCI bus scan did not locate the
Cluster
Manager.
Resolution:
A) Cycle power on the node.
B) Pull all PCI cards and cycle power on the node.
C) Replace the node motherboard.
Diagnostic:
A) Use "pci find 1590" at the Whack prompt
to see if the CM can be located. Since the same data
structure is used, it should not show up there either. Use
"pci init" which will scan the PCI bus again. If the CM
appears now (with "pci find 1590"), it may be a transient
problem.
B) Examine the output of "pci probe" to determine if other
onboard PCI devices are missing. This may help to determine
where the failure occurs. For example, if the four PCI
bridges do not show, it may be the CIOB at fault.
Fatal error: Code 28, subcode 0x0 (3)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Error: Harrier2 #1 not found for CM Init.
See Code 28, sub-code 0x0 (2) for resolution information.
Table Continued
86
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x0 (2)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Harrier2 #0 not found for CM Init.
The Harrier2 ASICs are the Cluster Managers which are
used for high speed communication between nodes of a
cluster.
These device are critical for the correct operation of the
node
software, and hence for operation of the whole cluster. The
CM
exists on all PCI buses in the node. If the CM cannot be
found on
any of the require PCI bus, this is a serious problem.subcode
0x2 indicates the PCI bus scan did not locate the Cluster
Manager.
Resolution: A) Cycle power on the node.
B) Pull all PCI cards and cycle power on the node.
C) Replace the node motherboard.
Diagnostic: A) Use "pci find 1590" at the Whack prompt to
see if the CM can be located. Since the same
data structure is used, it should not show
up there either. Use "pci init" which will
scan the PCI bus again.If the CM appears
now (with "pci find 1590"), it may be a
transient problem.
B) Examine the output of "pci probe" to determine
if other onboard PCI devices are missing. This
may help to determine where the failure occurs.
For example, if the four PCI bridges do not
show, it may be the CIOB at fault.
Fatal error: Code 28, subcode 0x0 (3)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Harrier2 #1 not found for CM Init.
See Code 28, sub-code 0x0 (2) for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
87
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 28,
sub-code 0x0 (xx04)
Description
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMM 0:
Unsupported Raw Card Type in SPD byte 62 = xx,
Using rdimm_control_words[0][].
Where xx, is the hex value that was read from DIMM0 SPD
Byte 62.
Byte 62 of the DIMM SPD indicates which JEDEC reference
design raw
card was used as the basis for the module assembly, if any.
Bits
4 ~ 0 describe the raw card and bits 6 ~ 5 describe the
revision
level of that raw card. Special reference raw card
indicator, 1F,
is used when no JEDEC standard raw card reference design
was used
as the basis for the module design. Preproduction modules
should
be encoded as revision 0 in bits 6 ~ 5.
The reference card is looked up in rdimm_control_words to
determine
the index into the rdimm_control_words table. If the value
in
Byte 62 is not found in the table this error reported.
Resolution: A) Replace DIMM with a supported Raw Card Type.
Non-fatal error: Code 28,
sub-code 0x0 (xx05)
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMM 1:
Unsupported Raw Card Type in SPD byte 62 = xx,
Using rdimm_control_words[0][].
Where xx, is the hex value that was read from DIMM1 SPD
Byte 62.
Resolution: A) Replace DIMM with a supported Raw Card Type.
Table Continued
88
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 28,
sub-code 0x0 (xxx06)
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMM Requirement failure at Offset %d (xxx): yy != zz
Where:
%d is the failing offset in decimal
xxx, is the failing offset in hex
yy is the value from DIMM0.0.0
zz is the value from the DIMM being evaluated
The Harrier2 ASICs are the Cluster Managers
which are used for high speed communication between nodes
of a
cluster. These device are critical for the correct
operation of
the node software, and hence for operation of the whole
cluster.
sub-code xxx06 indicates that one of the requirements of the
evaluated DIMM did not match DIMM0.0.0.
The current h2_dimm_check_list requirements are:
-Offset 2, DRAM Type must be DDR3 for all DIMMs (SPD Byte
2),
-Offset 3, DIMM Module type, must be RDIMM or LRDIMM for
all DIMMs (SPD Byte 3),
-Offset 256, DIMM size in MB (lo) must be the same for
all DIMMs,
-Offset 257, DIMM size in MB (hi) must be the same for
all DIMMs,
The DIMM size is calculated from various SPD bytes
and stored in Bytes 256/257 of the SPD structure. The
combined size is in MB.
Example of two different size DIMMs (16BG and 8BG)
installed: DIMM Requirement failure at Offset 257 (0x101):
40 !=
20
Resolution: A) Install all CM DIMMs that adhere to the
above requirements
Table Continued
Error codes—HPE 3PAR OS 3.3.1
89
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 28,
sub-code 0x1 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Pairwwww DIMMxxxx: Bad checksum. Got yyyy, SPD said zzzz
The memory DIMMs located on the CM riser are called cluster
memory. This memory is used to store data destined for the
disks (dirty data) as well as data previously read from the
disks (cache data). It is also used for communication among
the nodes in the cluster. This memory is not required to
boot
the operating system, but is required for the node to
participate in the cluster. Even before the memory is
thoroughly tested for proper operation, it must be
configured
to appear in CM addressable space.Each memory DIMM has a
small embedded serial EEPROM which holds DIMM configuration
information such as the number of rows, columns, and banks,
as
well as memory timing.If this serial EEPROM becomes corrupt,
data stored in it regarding the DIMM configuration cannot be
trusted. So, this EEPROM also contains a checksum which the
BIOS verifies is correct before configuring the DIMM. If
this
checksum does not match the checksum the BIOS computes
across
the DIMM, this error will result. You should look at prior
output to determine if there were I2C errors. These errors
suggest a problem with riser installation.
The DIMM number is logged in the Data field of the Fatal
Error.
Resolution: A) Reseat Cluster Memory riser card(s).
B) Reseat Cluster Memory DIMMs.
C) Replace Cluster Memory DIMMs in pairs to ensure
replacement parts are matched.
P4-Eagle and PIII-Eagle DIMM Pairs are always
located four riser positions apart.
For example, if you number the slots from the top,
Pair 0 is at positions 3 and position 7 (top).
Pair 1 is at positions 0 (bottom) and position 4.
Pair 2 is at positions 2 and position 6.
Pair 3 is at positions 1 and position 5.
Ironman (Tclass) and Tinman (Fclass) sets are
always in sets of three. The DIMMs are set as
"DIMM C.S" as in Channel then set. There are two
riser cards, one for channel 0 and one for
channel 1 and 2.
Set 0 is DIMM 0.0, 1.0, 2.0
Set 1 is DIMM 0.1, 1.1, 2.1
Set 2 is DIMM 0.2, 1.2, 2.2
Titan and Atlas have 4 DIMM sets on the motherboard.
Set 0: DIMM 0.0 and 1.0
Set 1: DIMM 0.1 and 1.1
Set 2: DIMM 2.0 and 3.0
Set 3: DIMM 2.1 and 3.1
D) Replace the Cluster memory riser(s).
E) Replace the node motherboard.
Table Continued
90
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Diagnostic: A) The Cluster Memory DIMMs appear on the I2C
bus
at 2.a0 through 2.ae. Use the Whack "d i2c"
command to display the DIMM serial EEPROM
contents to determine if there is a pattern.
Example (DIMM 5):
Whack> d i2c 2.aa.0
Fatal error: Code 28, subcode 0x2 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Pairww DIMMxx (yyyy): 'zzzz' read failed
Where xxxx is one of:
row address, column address, module rows, cas latency3,
refresh, banks, cas latency2, cas latency1, ras precharge,
act_to_rw, act_to_deact, ras cycle, write_to_deact,
density, frequency, DIMM type
This error indicates that a Cluster Memory DIMM was
detected but
that the Serial EEPROM present on the DIMM could not be
reliably
read.
The DIMM number is logged in the Data field of the Fatal
Error.
See Code 28, sub-code 0x1 for resolution information.
Non-fatal error: Code 28,
sub-code 0x3 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Pairqq DIMMtt (uuuu): vv != DIMMww (xxxx): yy zzzz
This error indicates the BIOS detected the SDRAM DIMMs in
the
cluster memory bank pair are of a different type.
One DIMM number of the mismatched pair will be logged in the
data field of the Fatal Error.
Resolution: A) Ensure both DIMMs in the pair are identical.
Note that two DIMMs may have the same capacity
but have different number of rows, columns, or
banks. The DIMM configuration must exactly
match. If the DIMMs have similar markings and
capacity, they are probably identical.
Diagnostic: A) The Serial EEPROM information in each pair of
DIMMs should be identical or nearly identical.
See Code 28, sub-code 0x1 for more resolution
information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
91
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x4 (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Pairww DIMM xx (yyyy): di_module_rows is not 1
or 2! zzzz
This error indicates the Cluster Memory DIMMs reported an
odd (and
unsupported) number of rows. Usually the number of rows
reported
by a DIMM corresponds to the number of sides of the DIMM
which
are populated by memory.
One DIMM number of the failing pair will be logged in the
Data field of the Fatal Error.
See Code 28, sub-code 0x3 for resolution information.
Fatal error: Code 28, subcode 0x5 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
No Cluster Memory Installed
This error indicates that no memory was found in the Cluster
memory riser. Since cluster memory is needed for proper
node
operation within the cluster, this is a condition which
must be resolved for proper operation. You should look at
prior
output to determine if there were I2C errors. These errors
suggest a problem with riser or DIMM installation.
See Code 28, sub-code 0x1 for resolution information.
Fatal error: Code 28, subcode 0x6 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Pairww DIMM xx (yyy): RAS cycle time > 10.
got zzz/10
We
This error indicates the Serial EEPROM on the DIMM reports
a value which is outside tolerance for the memory
controller.
One DIMM number of the failing pair will be logged in the
Data field of the Fatal Error.
See Code 28, sub-code 0x1 for resolution information.
Table Continued
92
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x7 (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Cluster Memory not responding.
DIMM uuu (vvv): Expected = (xxxx) Actual = (yyyy) Addr
(zzzz)
*** Error: Cluster Memory FAILURE - too many mismatches.
Before ECC initialization of Cluster memory (scrub), a small
region must be tested and configured by the CPU to set up
the
ECC scrub of the remainder. If an error occurs during this
test
(such as memory read does not match the value just
written), then
this error will be reported. The DIMM number is logged in
the
Data field of the Fatal Error.
Diagnostic: A) Compare the expected
pattern such as a bit stuck high
Example (bit 31 stuck low):
Expected = (0xf1f1f1e5) Actual =
Expected = (0x92929285) Actual =
Expected = (0xb3b3b3a5) Actual =
Expected = (0xd3d3d3c5) Actual =
and actual values for a
or stuck low.
(0x71f1f1e5)
(0x12929285)
(0x33b3b3a5)
(0x53d3d3c5)
See Code 28, sub-code 0x1 for resolution information.
Fatal error: Code 28, subcode 0x8 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Found errors during scrub. Eagle Error Status:
xxxx
*** Error: Found errors during scrub. Osprey Error Status:
xxxx
During the ECC initialization of Cluster memory,
The Cluster Manager records and memory errors it encounters.
If any were recorded, this error will be displayed.
See Code 28, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
93
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x9 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: CM DIMM programmed address > top of memory
For each Cluster memory DIMM, there is a register in the
Eagle /
Osprey memory controller which specifies where the DIMM
maps into
CM physical memory. These mapping registers are configured
during the Cluster memory probe and should not change under
normal circumstances. Since this is an internal CM
register,
it is unlikely that reseating memory will correct this
problem.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the CM
register set which is mapped into CPU memory for
access.Use the Whack "pci find 1590" command to
find the CM on the PCI bus.The base address in PCI
space for the configuration and status registers
(CSRs) is Window 0.Example:
Whack> pci find 1590
Win Baseaddr Basesize Identity
[0] 00:90200000 00:00000400 3PAR (ASIC) LPC#
[1] 00:20000000 00:20000000
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x90200000 above).
This is the base address of the CM Memory Control
Register Block.Refer to the Scaffold System
Architecture Reference for information as register
programming.
Table Continued
94
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0xa (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz)
*** Error: Uncorrectable ECC
The Cluster memory controller detected an Uncorrectable ECC
error.
Eagle / Osprey identifies the failing bank and address with
the
error as well as the error syndrome. The BIOS will convert
the
information into the failing DIMM and Riser Slot numbers.
There
may be multiple Uncorrectable errors. In this case, the CM
will
save the address/syndrome for the most recent error.
The DIMM number is logged in the Data field of the Fatal
Error.
Eagle nodes (S-Series and E-Series):
There are 8 DIMMs maximum on the S-Series Cluster Memory
Riser
Card. If the DIMM number is not between 0-7 (inclusive),
then
the failing DIMM cannot be identified.
Osprey nodes (T-Series and F-Series):
There are 6 DIMMs on T-Series and 3 DIMMs on F-Series.
The data field encodes which DIMM encodes the DIMM number
in the lower 4 bits of the field and the channel number in
the upper 4 bits. So a data value of 12 indicates DIMM 1.2
is at fault.
Harrier nodes (V-Series, Atlas, Minime1 & 2):
There are 8 DIMMs on V-Series between two different Harrier
ASICs; two memory controllers with 2 DIMMs each.
The data field encodes which memory channel encountered the
uncorrectable error. A data value of 10 means channel one
ia at fault, a value of 0 means channel zero is at fault.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM(s).
D) Replace the failing Cluster Memory DIMM(s).
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the
CM register set which is mapped into CPU memory
for access.Use the Whack "pci find 1590" command
to find the CM on the PCI bus. The base address in
PCI space for the configuration and status
registers (CSRs) is Window 0. Example:
Whack> pci find 1590
... Win Baseaddr Basesize Identity
... [0] 00:60200000 00:00000400 3PAR Eagle
... [1] 00:20000000 00:20000000
... [2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x60200000 above).
This is the base address of the CM Memory Control
Table Continued
Error codes—HPE 3PAR OS 3.3.1
95
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Register Block.Refer to the Scaffold System
Architecture Reference for information as register
programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The CM Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Table Continued
96
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0xb (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz)
*** Error: Correctable ECC
The Cluster memory controller detected a correctable ECC
error.
The CM identifies the failing bank and address with the
error as
well as the error syndrome. The BIOS will convert the
information
into the failing DIMM and Riser Slot numbers.
The DIMM number is logged in the Data field of the Fatal
Error.
Eagle nodes (S-Series and E-Series):
There are 8 DIMMs maximum on the Cluster Memory Riser Card.
If the DIMM number is not between 0-7 (inclusive), then the
failing DIMM cannot be identified.
Osprey nodes (T-Series and F-Series):
There are 6 DIMMs on T-Series and 3 DIMMs on F-Series.
The data field encodes which DIMM encodes the DIMM number
in the lower 4 bits of the field and the channel number in
the upper 4 bits. So a data value of 12 indicates DIMM 2.1
is at fault.
Harrier nodes (V-Series, Atlas, Minime1 & 2):
This should not occur on Harrier.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM.
D) Replace the failing Cluster Memory DIMM.
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the
CM register set which is mapped into CPU memory
for access.Use the Whack "pci find 1590" command
to find the CM on the PCI bus. The base address in
PCI space for the configuration and status registers
(CSRs) is Window 0.Example:
Whack> pci find 1590
Win Baseaddr Basesize Identity
[0] 00:60200000 00:00000400 3PAR Eagle
[1] 00:20000000 00:20000000
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x60200000 above).
This is the base address of the CM Memory Control
Register Block.Refer to the Scaffold System
Architecture Reference for information on register
programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The CM Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
Table Continued
Error codes—HPE 3PAR OS 3.3.1
97
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Table Continued
98
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0xc (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Addr (zzzzzzzz) Wrote (wwwwwwww) Read (yyyyyyyy)
or
*** Error: Data Miscompare in Final Block offset zzzzzzzz
*** Error: Expected (wwwwwwww) Actual (yyyyyyyy)
or
*** Error: CM DIMM5 (Jxxxx): Address (uu:uuuuuuuu)
CM DECODE TEST miscompare at (1) (vvvvvvvvvvvvvvvv)
Expected: (wwwwwwww)
Actual:
(yyyyyyyy)
Offset:
(zzzzzzzz)
or
similar to above
The CBIOS runs Cluster Memory Tests as part of POST in both
normal operation and manufacturing test. If any test fails
due to a data miscompare, the test will generate this fatal
error code with sub-code '0xc'. CBIOS runs the following
tests:
Walking 1/0 across data
Walking 1/0 across address (512 MB Small Memory Window)
Walking 1/0 using XCB (64 bytes) across segment boundaries
Any test failure will result in a fatal error.
The DIMM number is logged in the Data field of the Fatal
Error.
Eagle nodes (S-Series and E-Series):
There are 8 DIMMs maximum on the Cluster Memory Riser Card.
If the DIMM number is not between 0-7 (inclusive), then the
failing DIMM cannot be identified.
Osprey nodes (T-Series and F-Series):
There are 6 DIMMs on T-Series and 3 DIMMs on F-Series.
The data field encodes which DIMM encodes the DIMM number
in the lower 4 bits of the field and the channel number in
the upper 4 bits. So a data value of 12 indicates DIMM 2.1
is at fault.
Harrier nodes (V-Series, Atlas, Minime1 & 2):
This should not occur in Harrier.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM.
D) Replace the failing Cluster Memory DIMM.
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the
CM register set which is mapped into CPU memory
for access.Use the Whack "pci find 1590" command
to find the CM on the PCI bus. The base address in
PCI space for the configuration and status registers
(CSRs) is Window 0.Example:
Whack> pci find 1590
Win Baseaddr Basesize Identity
[0] 00:60200000 00:00000400 3PAR Eagle
[1] 00:20000000 00:20000000
Table Continued
Error codes—HPE 3PAR OS 3.3.1
99
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x60200000 above).
This is the base address of the CM Memory Control
Register Block.Refer to the Scaffold System
Architecture Reference for information on register
programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The CM Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Fatal error: Code 28, subcode 0xd (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Pairwwww DIMMxxxx: Illegal SPD value <name of value> <value>
This error indicates that a Cluster Memory DIMM was
detected but
that the Serial EEPROM present on the DIMM reported an
illegal
or unsupported value for our memory controller.
The DIMM number is logged in the Data field of the Fatal
Error.
Example:
Density (SPD byte 31) has more than 1 bit set (ie. 0x30)
which indicates a non-standard part.
See Code 28, sub-code 0x1 for resolution information. Most
likely, the DIMM is not qualified for use in our Node Board.
Fatal error: Code 28, subcode 0xe (mm)
CM_MEMORY_FAILURE "Cluster Memory Failure"
If there was a problem mapping the CM Small Cluster memory
window into CPU 32-bit space, this error may result when
attempting to initialize Cluster memory. The initialization
problem could be due either to hardware failure or by
setting
a special NVRAM variable that eliminates the address space
normally reserved for CM memory windows. An example of
such is setting "mem_max" to a value above 2496. Another
example would be setting "pci_base" above 0xa0000000.
Resolution: Contact 3PAR technical support.
Table Continued
100
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0xf (mm)
Description
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Bank (xx) CM DIMMyy (Jzzzz)
*** Error: CM DIMMs with ECC errors
The Cluster memory controller detected a memory error in a
specific DIMM bank. The CM memory error status register is
logged in the Data field of the Fatal Error.
See Code 28, sub-code 0xb for resolution information.
Fatal error: Code 28, subcode 0x10 (mm)
CM_MEMORY_FAILURE
H1 LPC0 HW ERR ST
H1 LPC0 ERR Stat
H1 LPC0 ERR ID
"CMA Failure"
[00000004]: dataq_parity
[00000006]: EP-Error-Rpt Fatal-Error
[80000000]: HW-Err
The Cluster memory controller detected a hardware error.
This
error is printed, as shown above. mm is decoded as bits
31-28
represent the LPC number and bits 27-0 are the error bits as
set in the hardware error status register.The hardware error
means that the Harrier ASIC is non functional.
Resolution: A) Cycle power on the node.
B) Replace the node.
Fatal error: Code 28, subcode 0x20 (mm)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Testing CM data lines with walking 1
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 bits test verifies that the processor may
directly access CM cluster memory by performing a walking
1's
test on all data lines. If any fails, this error will
result.
The data value (mm) could be in the form 0x00XXYYZZ there XX
is the DIMM number (0-11), YY is the return code (RC_??),
and
the ZZ valeu is the number of errors found.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat Cluster Memory DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack command line to attempt to
access CM
memory manually to determine if data line bits are
stuck.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
101
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x21 (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM data lines with walking 0
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 bits test verifies that the processor may
directly access cluster memory by performing a walking 0's
test on all data lines. If any fails, this error will
result.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x22 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
ZERO CM problem at addr xxxx
Between PCI bus tests, a small portion of cluster memory
is cleared. If errors in clearing the memory are detected,
this error will result.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x23 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM address lines with walking 1 (first 512 MB only)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 address bits test verifies that the
processor
may directly access cluster memory by performing a walking
1's
test on all address lines.If any fails, this error will
result.
See Code 28, sub-code 0x20 for resolution information.
Table Continued
102
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x24 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM address lines with walking 0 (first 512 MB only)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 address bits test verifies that the
processor may
directly access cluster memory by performing a walking 0's
test
on all address lines. If any fails, this error will result.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x25 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM segment decode boundaries
This test verifies that memory decoding at all CM DIMM
pairs is
working correctly.It does so by writing a unique 128 bytes
at
each memory decode boundary location. It then verifies the
values were written correctly and looks for corruption of
other
addresses.
See Code 28, sub-code 0x20 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
103
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x26 (eecd)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM with random XOR (all Cluster Memory)
ee = number of errors in XOR errors.
c = Channel Number where the error took place.
d = DIMM number where the error took place.
HW error during
or
HW error during
or
*** Error: Data
Expected (yyyy)
XCB transfer CM -> CM
XCB transfer CM -> PCI 1 (xxxx)
Miscompare in Final Block offset xxxx
Actual (zzzz)
This function performs a random data test on all cluster
memory
attached to the CM to verify memory under stress with random
patterns. This test also exercises the CM XOR engine as
several
sources are used simultaneously throughout the cluster
memory test.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x27 (0)
CM_MEMORY_FAILURE (<DIMM>)"DQS Training Failed"
This error occurs when the DQS training fails to find
working
values for the DQS enable, DQS out skew, and DQS in skew.
See Code 28, sub-code 0x20 for resolution information.
*** Fatal error: Code 28, sub-code 0x30 (mm).
CM_MEMORY_FAILURE "Cluster Memory Failure"
Testing CM ECC lines with walking 1
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 bits test verifies that the processor may
directly access CM cluster memory by performing a walking
1's
test on all ECC lines.If any fails, this error will result.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat Cluster Memory DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack command line to attempt to
access CM
memory manually to determine if data line bits are
stuck.
Table Continued
104
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x31 (mm)
Description
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM ECC lines with walking 0
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 bits test verifies that the processor may
directly access cluster memory by performing a walking 0's
test on all ECC lines.If any fails, this error will result.
See Code 28, sub-code 0x30 for resolution information.
Fatal error: Code 28, subcode 0x32 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM Op Codes
The CM Op Code test verifies that the processor may execute
one of the available operations for this cluster manager
ASIC. This error means that a particular opcode is not
supported.
If any op code fails, this error will result.
Resolution: A) Replace the node motherboard.
Fatal error: Code 28, subcode 0x33 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM Source Interrupts
The CM Source
generated for
companion CMA
systems
with only one
Interrupts test will test that an interrupt is
each CMA data path, from processor, CMA, or
to either processor memory to local CMA.On
CMA, the companion tests are not done.
Resolution: A) Replace the node motherboard.
Fatal error: Code 28, subcode 0x34 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM I2C communication test
The CM I2C communication test will read and write to various
safe CMA registers or CMA memory and verify that the
expected
values are read. A fail means either a bad DIMM or bad CMA.
See Code 28, sub-code 0x30 for resolution information.
Fatal error: Code 28, subcode 0x35 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Stopped on an Uncorrectable Error
The scan for errors found an uncorrectable error in one
of the CMAs. The system stopped during a BIOS test when
this error was discovered.
See Code 28, sub-code 0x30 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
105
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x36 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Stopped on a Correctable Error
The scan for errors found a correctable error in one
of the CMAs. The system stopped during a BIOS test when
this error was discovered.
See Code 28, sub-code 0x30 for resolution information.
Fatal error: Code 28, subcode 0x40 (mm)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Testing CM MMW data lines with walking 1
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 bits test verifies that the processor may
directly access CM cluster memory by performing a walking
1's
test on all data lines. This test uses the Medium Memory
Window (MMW). If any fails, this error will result.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat Cluster Memory DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack command line to attempt to
access CM
memory manually to determine if data line bits are
stuck.
Fatal error: Code 28, subcode 0x41 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM MMW data lines with walking 0
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 bits test verifies that the processor may
directly access cluster memory by performing a walking 0's
test on all data lines. This test uses the Medium Memory
Window (MMW). If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Fatal error: Code 28, subcode 0x42 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
ZERO CM problem at addr xxxx
Between PCI bus MMW tests, a small portion of cluster memory
is cleared. If errors in clearing the memory are detected,
this error will result.
See Code 28, sub-code 0x40 for resolution information.
Table Continued
106
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x43 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 1 (MMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 address bits test verifies that the
processor
may directly access cluster memory by performing a walking
1's
test on all address lines using the medium memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Fatal error: Code 28, subcode 0x44 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 0 (MMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 address bits test verifies that the
processor may
directly access cluster memory by performing a walking 0's
test
test on all address lines using the medium memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
107
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 28, subcode 0x45 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 1 (RMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 address bits test verifies that the
processor
may directly access cluster memory by performing a walking
1's
test on all address lines using the remote memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Fatal error: Code 28, subcode 0x46 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 0 (RMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 address bits test verifies that the
processor may
directly access cluster memory by performing a walking 0's
test
test on all address lines using the remote memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Table Continued
108
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 28, subcode 0x47 (wwxxyyzz)
Description
CM_MEMORY_FAILURE"Cluster Memory Failure"
*** Error: MTB Granularity error!
SPD Byte 10, expected: ww, actual: yy
SPD Byte 11, expected: xx, actual: zz
All the MTB (Medium TimeBase) calculations in the software
leveling
code are based on an MTB granularity of 0.125ns (SPD Byte
10=0x01
and Byte 11=0x08). These bytes define a value in
nanoseconds that
represents the fundamental timebase for medium grain timing
calculations. This value is typically the greatest common
divisor
for the range of clock frequencies (clock periods)
supported by a
particular SDRAM. This value is used as a multiplier for
formulating
subsequent timing parameters. The medium timebase (MTB) is
defined
as the medium timebase dividend (byte 10) divided by the
medium
timebase divisor (byte 11).
Resolution: A) Replace CM DIMM.
Fatal error: Code 29, subcode 0x0 (data)
CM_LINK_FAILURE "Cluster Link Failure"
Link 0 did not come up (0xac000000) error = (0x002022ff)
(data = link number)
CM Links are high speed connections between all of the
node boards in a cluster via the center panel.
During Manufacturing test, nodes are connected to a
special Manufacturing Center panel that connects the link
transmitter to its own receivers (external loopback).
When the node senses that it is in this special Center
Panel, it will initialize all of the links and perform
loopback tests. If any link fails to initialize, this
sub-code will be reported.
Resolution: A) Cycle power on the node.
B) Verify that the node is securely mated with
the Center Panel.
C) Turn off power, re-seat the node into the
center panel, and turn power back on.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack "eagle link" commands to run
more
diagnostic tests on the links. The CM requires
both the PCI scan has completed and Cluster Memory
present and initialized.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
109
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 29, subcode 0x1 (data)
CM_LINK_FAILURE "Cluster Link Failure"
CM Link Initialization failed
(data = LLRR) where
LL is the link bit pattern. 01 is link 0, 02 is link 1, 04
is
link 2, and 08 is link 3.
RR is the failure reason. E4 is Hardware error, F0 is user
abort.
CM Links are high speed connections between all of the node
boards via the center panel. During Manufacturing test,
nodes
are connected to a special Manufacturing Center panel that
connects each link's transmitter to its own receiver
(external
loopback).When the node senses that it is in this special
Center
Panel, it will initialize the links and run a special test
to
verify the operation of the transmitter/receivers of each
link.
If any link fails, the test will report this sub-code.
See Code 29, sub-code 0x0 for resolution information.
Fatal error: Code 29, subcode 0x2 (data)
CM_LINK_FAILURE "Cluster Link Failure"
CM# Link XOR test: Link [0]..[FAIL] (1)
(data = the link bit pattern. bit 0 is link 0, bit 1 is
link 1, bit
2 is link 2, and bit 3 is link 3.
CM Links are high speed connections between all of the node
boards via the center panel. During Manufacturing test,
nodes
are connected to a special Manufacturing Center panel that
connects each link's transmitter to its own receiver
(external
loopback).When the node senses that it is in this special
Center
Panel, it will initialize the links and run a special test
to
verify the operation of the transmitter/receivers of each
link.
If any link fails, the test will report this sub-code.
See Code 29, sub-code 0x0 for resolution information.
Table Continued
110
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 29, subcode 0x3 (data)
CM_LINK_FAILURE "Cluster Link Failure"
CM# Link INT??? test: Link [0]..[FAIL] (1)
(data = the link bit pattern. bit 0 is link 0, bit 1 is
link 1, bit
2 is link 2, and bit 3 is link 3.
The CM Link INT test verifies that setting either of the two
interrupt flags (DEST, SRC) in the XCB does actually
generate
and interrupt to the processor.
See Code 29, sub-code 0x0 for resolution information.
Fatal error: Code 29, subcode 0x4 (data)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error RTT Link 1 XCB ASync failed (Send)
(data = link number)
The CM Link Round Trip Test failed due to an XCB failure.
CM XCB failed during link DMA.Use the "eagle status" command
for more information on the type of error.This test checks
the
CM link status at multiple times during the test.
The "(Send)" part of the message indicates which stage
failed. Another possible values is "(Receive)".
See Code 29, sub-code 0x0 for resolution information.
Fatal error: Code 29, subcode 0x5 (data)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error RTT (Receive) Link 1 Length = 0
or
*** Error RTT Offset = xxxxx Expected = yyyyy
Returned = zzzzz
or
*** Error RTT (Return) Link 1 Length mismatch
or
*** Error RTT (Return) Link 1 Timestamp mismatch.
(data = link number)
The CM Link Round Trip Test failed due to data miscompare.
All packets have a length check and timestamp check.
Payload
compare is optional. Use the "eagle status" command to
check
for Uncorrectable ECC errors.
See Code 29, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
111
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 29, subcode 0x6 (data)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error RTT (Return) Timeout waiting for packet Link 1
(data = link number)
The CM Link Round
A packet was sent
period. The Round
a remote node.Use
Uncorrectable ECC
Trip Test failed due to packet timeout.
and not received in a reasonable timeout
Trip Test may not have been started on
the "eagle status" to check for
errors.
Resolution: A) Start CM Link Round Trip Test on remote node.
B) Cycle power on the node.
C) Verify that the node is securely mated with
the Center Panel.
D) Turn off power, re-seat the node into the
center panel, and turn power back on.
E) Replace the node motherboard.
Diagnostic: A) Use the Whack "eagle link" commands to run
more
diagnostic tests on the links. The CM requires
both the PCI scan has completed and Cluster Memory
present and initialized.
Fatal error: Code 29, subcode 0x10 (0)
CM_LINK_FAILURE "Cluster Link Failure"
REC_EN went low. Test failed for link [x](yyyyyyyy)
The "cma link init" command is used to initialize and bring
up
the CM links to nodes which indicate a "Power Ok" state.
If this
error occurs, it is possible the remote node was
transmitting
BIST, but then later stopped (such as from a reset or power
off).
Resolution: A) Perform the same test again.
B) Replace the node motherboard.
Diagnostic: A) Verify CM link may be brought up manually
using
the "eagle link set" command.
Table Continued
112
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 29, subcode 0x11 (0)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error CM linkxx producer / consumer mismatch
The CM has XCB engines which transfer data. Software
manages the
producer register and the CM hardware follows with the
consumer
register. If these two do not agree and CM should be idle,
then
it's possible the CM has halted due to failure of some
operation.
This problem is likely caused by a cluster memory or link
failure.
Resolution: A)
B) Replace the
C) Replace the
Diagnostic: A)
Fatal error: Code 30, subcode 0x0 (0)
Cycle power on the node.
node motherboard.
link partner node.
Replace Eagle/Osprey/Harrier ASIC.
SERIAL_PORT_FAILURE "Serial Port Failure"
*** Error: No Oxford serial chip xx found
or
*** Error: No Exar serial chip found
The Exar and Oxford serial chips are used for a secondary
low speed link which directly connects all nodes in the
cluster. They are primarily in the event of a link failure
to verify whether another node in the cluster has actually
gone down.Since the part is integrated onto the motherboard
and is on a PCI bus, a failure to locate the internal serial
chips may indicate other PCI problems as well.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "pci probe" command to show all
devices on the PCI bus.Look for the two
Oxford device entries, or a single Exar device
entry (Pentium 4 node).If they are not there,
verify other board level components are present
in the list in order to isolate the component
failure on the board.
B) Note that a failure of a single Oxford chip
may be the cause of this behavior as one
bridges to the PCI bus for both.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
113
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 30, subcode 0x1 (0)
Description
SERIAL_PORT_FAILURE "Serial Port Failure"
*** Error: Serial Port Mfg Test failed
Port (3) [FAIL]
When the Node board is inserted into a Manufacturing
Test Centerpanel, the internal Serial Port Manufacturing
test
will automatically run. This error indicates failures on
all ports tested.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "pci probe" command to show all
devices on the PCI bus.Look for the two
Oxford device entries or a single Exar device
entry (Pentium 4 node).If they are not there,
verify other board level components are present
in the list in order to isolate the component
failure on the board.
B) Note that a failure of a single Oxford chip
may be the cause of this behavior as one
bridges to the PCI bus for both.
C) Whack provides internal serial Serial Port
commands for further analysis.
Fatal error: Code 30, subcode 0x2 (0)
SERIAL_PORT_FAILURE "Serial Port Failure"
Port (4):Processed 109 bytes[FAIL]
All cluster internal serial ports go through a quick
internal
loopback test immediately after initialization to do a short
test of proper operation. This test will run regardless of
the type of centerplane in which the node is connected. This
error indicates failures on all ports tested.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "pci probe" command to show all
devices on the PCI bus.Look for the two
Oxford device entries or a single Exar device
entry (Pentium 4 node).If they are not there,
verify other board level components are present
in the list in order to isolate the component
failure on the board.
B) Note that a failure of a single Oxford chip
may be the cause of this behavior as one
bridges to the PCI bus for both.
C) Whack provides internal serial Serial Port
commands for further analysis.
Table Continued
114
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 30, subcode 0x3 (0)
SERIAL_PORT_FAILURE "Serial Port Failure"
Internal UART is not functioning properly.
Most likely this is due to a hardware failure related to
the SuperIO.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Non-fatal error: Code 31,
sub-code 0x0 (0)
GPIO_TEST_FAILURE "GPIO Failure"
FAIL (high)
Port (6) Bit (4) wrote 0(0x1)
Port (7) Bit (4) read 1, expected 0(0x3)
The Vitesse VSC055 2 Wire Backplane Controller chip controls
interfaces to the Centerplane, LEDs, Power Supplies, Nickel
battery, and PCI slots. It is connected to the I2C bus.
In normal 2, 4, or 8 node centerplanes, the chip will get
its ports initialized as inputs or outputs and start
monitoring
peripheral systems. No tests available.
When connected to a Manufacturing Centerplane, it will have
selected pins routed to other pins for loopback testing.
See the Manufacturing Centerplane Specification for details.
During this test, proper VSC operation will be confirmed.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Whack "i2c vsc" commands can be used to peek
and
poke the VSC055 chip when in a Manufacturing
Centerplane. In normal Centerplanes, these
pins will be connected to other components and
should not be modified.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
115
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 31,
sub-code 0x1 (0)
GPIO_TEST_FAILURE "GPIO Failure"
Failed I2C VSC055 1.ce.yy write zzzz
During initialization, the VSC055 registers are programmed
for
proper system operation. This is done over the I2C bus. If
an I2C operation fails during VSC055 initialization, this
error
will result.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Whack "i2c vsc" commands can be used to peek
and
poke the VSC055 chip. One failure seen in the
past with the VSC055 is that sometimes a specific
chip could not handle the first write access to
the command register which causes a soft reset.
It was determined the part violated the I2C
protocol in ACKing the transaction before the
I2C write operation completed.
Fatal error: Code 31, subcode 0x2 (0)
GPIO_TEST_FAILURE "GPIO Failure"
FPGA Scratchpad registers failed meaning bad FPGA hardware.
Resolution: A) Cycle power on the node.
B) Replace the node.
Non-fatal error: Code 31,
sub-code 0x3 (0)
GPIO_TEST_FAILURE "GPIO Failure"
FPGA Interrupt Test failed.
Resolution: A) Cycle power on the node.
B) Replace the node.
Non-fatal error: Code 31,
sub-code 0x4 (0)
GPIO_TEST_FAILURE "GPIO Failure"
NEMOE Loopback Test failed.
Resolution: A) Cycle power on the node.
B) Replace the node.
Non-fatal error: Code 31,
sub-code 0x5 (0)
GPIO_TEST_FAILURE "GPIO Failure"
During the "Board GPIO Test", the FPGA ID is not what it
expects
it to be.
Resolution: A) Cycle power on the node.
B) Replace the node.
Table Continued
116
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 31,
sub-code 0x6 (0)
Description
GPIO_TEST_FAILURE "GPIO Failure"
During the "Board GPIO Test", the FPGA Revision is not what
it
expects it to be.
Resolution: A) Cycle power on the node.
B) Replace the node.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
117
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 31,
sub-code 0x7 (0)
Description
GPIO_TEST_FAILURE "GPIO Failure"
Titan specific. During the "Manufacturing Centerpanel GPIO
Test",
one or more tests have failed depending upon the output.
A)If failed during
'Testing Expanders (o/p) <--> FPGA (i/p) connections:'
For example,
FAIL (low)
Port (76) Bit (1) wrote(0x00)
Port (302) Bit (4) read 0xff, expected(0xef)
1) Program I2C expander by following command:
Whack> cb i2c 9.76.3 0
Here "76" is reported port number.
"3" is config register offset for the expander.
"0" makes all expander bits as output.
2) Set the bit in I2C expander.
Whack> cb i2c 9.76.1 2
Here "1" is rdwr register offset for the expander.
"2" is reported bit 1 (1 << "1") in expander.
3) Read a byte from FPGA offset.
Whack> db fpga 302 1
Here "302" is reported FPGA offset.
Confirm if the bit "4" in read value is set.
Repeat step 2) and 3) by writing 0 to I2C expander
9.76.1 and checking if the bit "4" in FPGA offset
0x302 is cleared.
B)If failed during
'Testing FPGA (o/p) <--> Expanders (i/p) connections:'
For example,
FAIL (low)
Port (305) Bit (4) wrote(0x00)
Port (7e) Bit (7) read 0x86, expected(0x06)
1) Program I2C expander by following command:
Whack> cb i2c 9.7e.3 ff
Here "7e" is reported port number.
"3" is config register offset for the expander.
"ff" makes all expander bits as input.
2) Write a byte to FPGA offset.
Whack> db fpga 305 10
Here "305" is reported FPGA offset.
Writing 0x10 will set the bit "4" in that offset.
3) Read a byte from I2C expander.
Whack> db i2c 9.7e.0 1
Here "7e" is reported port number.
"0" is read register offset for the expander.
Confirm if the bit "7" in read value is set.
Repeat step 2) and 3) by writing 0 to the FPGA offset
and checking if the bit "7" in I2C Expander 9.7e.0 is
cleared.
C)For all other failure cases refer to
Section # 18.2
"Manufacturing Centerplane GPIO Test Diagnostics" of
Table Continued
118
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
CBIOS user guide at
http://engweb/twiki/bin/view/Main/TitanMfgCpFpgaGpioTestDiag
Fatal error: Code 32, subcode 0x1 (chunk)
CM_XOR_FAILURE "CM XOR Failure"
Xor Engine Status: P0_XERR
Error Status : XOR_ERR
PCI0 Error Status:
PCI1 Error Status:
The Eagle, Osprey, and Harrier ASICs contain a DMA engine
capable
of XOR operations.This DMA engine is commonly referred to
as the XCB engine.The XCB engine can DMA data between 14
different modules within the ASIC, each module capable of
sinking or sourcing data. The XCB engine will stop all DMA
if it encounters an error while transferring data.The XCB
error status indicates the module that produced the error.
Further details of the error can be gathered by inspecting
the error registers of that module. Use the whack
command "cma status all" to get further diagnostic
information.
If the user continues past this error, software will attempt
to reset the error and continue.
Sub-code 0x1 is specific to Osprey and indicates an
uncorrectable
ECC error following an attempt to zero all of cluster
memory.
The "chunk" value indicates the chunk where the ECC error
occurred.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Whack "cma status all" command displays the
status
registers for each CM module. Refer to the module
that produced the error for further information and
diagnostic procedure.
Fatal error: Code 32, subcode 0x2 (chunk)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Osprey and indicates an
uncorrectable
ECC error following an attempt to ECC scrub all of cluster
memory.
The "chunk" value indicates the chunk where the scrub error
occurred.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
119
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 32, subcode 0x6 (chunk)
Description
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates an
uncorrectable
ECC error following an attempt to zero all of cluster
memory.
The "chunk" value indicates the chunk where the ECC error
occurred.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Fatal error: Code 32, subcode 0x7 (err_last)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates a
general Harrier
DMA error following an attempt to zero all of cluster
memory.
The "err_last" value represents the normalized content of
the Harrier
mem_common->mem_err_status register.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Fatal error: Code 32, subcode 0x8 (chunk)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates an
uncorrectable
ECC error following an attempt to ECC scrub all of cluster
memory.
The "chunk" value indicates the chunk where the scrub error
occurred.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Fatal error: Code 32, subcode 0x9 (err_last)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates a
general Harrier
DMA error following an attempt to ECC scrub all of cluster
memory.
The "err_last" value represents the normalized content of
the Harrier
mem_common->mem_err_status register.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Table Continued
120
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 33, subcode 0x0 (0)
SDRAM_I2C_BAD_READ "Memory I2C Bad Read"
*** Error: Unable to read from SDRAM at I2C ww.xx.yy.zz
This error indicates that an SDRAM DIMM for which
information
was requested is no longer available. This may be due to an
intermittent I2C bus, or a hardware failure.
Resolution: A) Cycle power on the node.
B) Replace the failing DIMM's pair.
C) Replace the node motherboard.
Fatal error: Code 34, subcode 0x1 (0xff)
PCI_BUS_ERROR "PCI Bus Failure"
This error indicates an uncorrectable error occurred on the
PCI bus. In the future, the data field may indicate the PCI
slot number for the device which failed. In order to
determine
the cause of this error, it may be useful to review either
console messages or the IDE disk log. Typical messages
preceding this error are likely difficult to read, but may
indicate the exact cause. Example:
--- SMI: smm_inb(0x3a) == 0x86
GPE 9 triggered
Error in PCI device 02.02.00 (PCI/PCI Bridge #0 (controls
slot 1)):
PCI status register (0x06) [62b0]: Signaled system error
(SERR#),
Received master abort
Secondary PCI status register (0x1e) [0aa0]: Signaled
target abort
Bridge P_SERR (0x6a) [80]: Delayed transaction master
initiator timeout
Error in PCI device 03.01.00 (PCI Slot 1):
PCI status register (0x06) [1290]: Received target abort
Secondary PCI status register (0x1e) [0a80]: Signaled
target abort
Error in PCI device 04.06.00 (inside PCI Slot 1):
PCI status register (0x06) [1230]: Received target abort
Error in PCI device 04.06.01 (inside PCI Slot 1):
PCI status register (0x06) [1230]: Received target abort
(PCI errors not cleared)
Table Continued
Error codes—HPE 3PAR OS 3.3.1
121
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 34, subcode 0x1 (ff)
Description
In the above case, a card in PCI Slot 1 was transferring
data up
to a device, likely the cluster manager, when it didn't get
a
response. The bridge above the card received a master
abort,
which it then relayed to its secondary side as signaled
target
abort.The bridge on the card in PCI Slot 1 then received the
target abort and signaled a target abort on its secondary
side.
Both PCI devices then indicated they received target aborts.
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Replace the suspected PCI card.
D) Remove PCI cards one at a time.
E) Replace the node motherboard.
Fatal error: Code 35, subcode 0x0 (data)
SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable"
One or both DIMMs in a DIMM pair has failed. Bits 4-7
of the data value indicate the DIMM pair.
If data is 0, then DIMM pair 0 has failed.
if data is 10, then DIMM pair 1 has failed.
Example:
--- SMI: TEMPCAUT (SMALERT): 0x01 (bits reset)
Uncorrectable ECC error 0x9279a103 recorded in reg 0x98
Pair1, either DIMM1 or DIMM3 contains the error
Error in locations [0x382cd818 .. 0x382cd81f]
Uncorrectable ECC error 0x9279a101 recorded in reg 0x94
Syndrome/bit number information might not be accurate,
as more than 1 error happened
Pair1, either DIMM1 or DIMM3 contains the error
Error in locations [0x382cd808 .. 0x382cd80f]
(Clearing cache line at 0x382cd800)
(Clearing cache line at 0x382cd800)
ESR == 0x0003 (expected low bit == 0)
Fatal error: Code 35, subcode 0x0 (10)
Resolution: A) Cycle power on the node.
B) Clear dust and debris from the node.
C) Remove and reseat the specified CPU DIMM pair.
D) Replace the failed CPU DIMM pair.
E) Replace the node motherboard.
Diagnostic: A) Verify North Bridge heatsink attachment.
B) Check DIMM clock buffers (X6200 on P4-Eagle).
C) Check DIMM termination (R5836, etc on P4-Eagle nodes).
Table Continued
122
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 35, subcode 0x1 (data)
Description
SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable"
A single DIMM of a DIMM pair has failed. The data value
indicates which DIMM. Bits 4-7 of the data value indicate
which DIMM pair. Bits 0-3 of the data value indicate which
DIMM within that pair.
If data is 0, then DIMM 0 of pair 0 has failed.
If data is 1, then DIMM 1 of pair 0 has failed.
if data is 10, then DIMM 0 of pair 1 has failed.
if data is 11, then DIMM 1 of pair 1 has failed.
Resolution: A) Cycle power on the node.
B) Clear dust and debris from the node.
C) Remove and reseat the specified CPU DIMM.
D) Replace the failed CPU DIMM.
E) Replace the node motherboard.
Fatal error: Code 35, subcode 0x2 (data)
SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable"
This code means an ECC error was detected, but the BIOS did
not completely decode the error.
See Code 35, sub-code 0x0 for resolution information.
Fatal error: Code 36, subcode 0x0 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: SMI: SERR# input went low
In the event of a hardware failure, it is normal to trigger
a processor System Management Interrupt (SMI).If the SMI
gets cleared before the BIOS has a chance to observe it
(which should not happen), then this error will result.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Fatal error: Code 36, subcode 0x1 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: SMI: Write made to ACPI PM register
In normal operation the operating system should not write
to the ACPI PM register. If the BIOS detects a write took
place, it will flag this as an error caused by a failing
operating system or other node hardware.
Resolution: A) Cycle power on the node.
B) Reinstall the operating system.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
123
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 36, subcode 0x2 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: SMI not fully handled.
The BIOS was not able to determine the actual cause of the
triggered SMI.
Resolution: A) Cycle power on the node.
B) Reinstall the operating system.
C) Replace the node motherboard.
Fatal error: Code 36, subcode 0x3 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
--- SMI: No known cause (# 4097)
GPE status: 0x400000, GPE input: 0x0xfff7ff
*** Error: SMI: No known cause is too frequent
This error may result if there is an unknown hardware device
triggering SMIs in the system and those SMIs are happening
too frequently. Most likely the device continues to trigger
an SMI because its problem has not been serviced, and no
real work is possible at this point because immediately
after returning from the SMI, another is triggered. The
BIOS attempts to recognize this condition and stop with a
fatal error rather than just continuing to display errors.
Resolution: A) Remote reset or cycle power on the node.
B) Reinstall the operating system.
C) Replace the node motherboard.
Table Continued
124
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 36, subcode 0x4 (0)
Description
FATAL_SMI_ERROR "Fatal SMI Error"
*** Warning: SMI cause is too frequent; disabling SMI
handling
*** Error: SMI cause could not be masked
This error may result if a known SMI cause is happening too
frequently. In a normally functioning node, SMIs should
occur
infrequently, as there is a performance impact associated
with
handling each SMI.The BIOS will first attempt to disable
known
SMIs in order to mask this problem. If that is insufficient,
the BIOS will stop with this fatal error.
Resolution: A) Check for CPU memory DIMM correctables in the
event log. Replace DIMMs if they are suspect.
B) Check for hardware oscillating events in the event
log (such as PS status). On some node types, board
GPIO changes are reported through SMI. You may need
to replace power supplies or another FRU.
C) Replace the node motherboard.
Diagnostic: A) Set "fatal_no_reboot" at Whack and then
enter Whack
at the Fatal Error.You should be able to inspect
the state of the machine prior to SMI handling to
see what status is asserted. Output from the following
Whack commands may be helpful:
1) eagle status
2) vsc status
3) pci status
4) mem bridge
Fatal error: Code 36, subcode 0x5 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: In SMI on CPU ww [xx], CR2 was 0xyyyy,
but got changed to 0xzzzz
This error will result if the BIOS inadvertently changes
the contents
of CR2 while processing a SMI.This should not happen in
normal
operation, but might happen as the result of a `whack'
command.
As returning from this SMI could easily cause corruption of
the OS
or of a user-level program, this fatal error is flagged
instead.
Resolution: A) Cycle power on the node.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
125
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode zz (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
Code 37 sub-codes are a bitmask of error values.
This means you may find an error which will simultaneously
trigger multiple GEVENTs. This event is probably one of the
hardest to interpret as it often will indicate multiple
board
devices have detected a fatal error condition.In general,
it's much more convenient to look up the decoded error in
the
BIOS output of the idelog rather than manually decoding this
event back to indicators.
Resolution: Look up each individual documented sub-code
below which when OR'd together form the sub-code observed.
Fatal error: Code 37, subcode 0x1 (0)
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x01
CMIC_FATAL (GEVENT0)
This error indicates the CMIC (North Bridge) had a fatal
error.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[0]: PCI2_PERR_L
This error indicates either the PLX #2 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #2 detected a parity error.
These
components manage PCI slots 0, and 1 on T-Series and Slot 0
on F-Series.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[0]: PEX2_FATAL_ERROR
This error indicates that the PLX #2 PCIe-PCIe bridge
detected
a fatal error.These components manage PCI slots 0, 1, and 2;
Harrier 1 and 2 LPC0.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove any recently installed PCI cards.
D) Remove all PCI cards.
E) Replace the node motherboard.
Table Continued
126
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x2 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x02
ALERT (GEVENT1)
Error in PCI device 00.00.00 (CMIC-LE Memory Controller/
Thin IMB):
ESR (0x4c) [0004]: IMBus error
(PCI errors not cleared)
The output above can be considered "typical" but really may
contain any of the possible CMIC (North Bridge) Memory
Controller
or other PCI bus errors. An IMBus error indicates a
communication
problem between the North Bridge and one of the South
Bridge or
CIOBX2. This would likely indicate a node motherboard
failure.
It has been observed in the field that a flaky or bad PCI
socket
may also cause this.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove any recently installed PCI cards.
D) Remove all PCI cards.
E) Replace the node motherboard.
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[1]: MCH Fatal Error
This error indicates the MCH (North Bridge) has detected a
fatal
condition.Most likely there are other error messages present
in the idelog to help pinpoint the issue. Since the MCH is
the
top of the root complex, it's very common to see the MCH
indicating
Fatal error on nearly all failures.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs if no other error is indicated.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
127
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x4 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
--- SMI: smm_inb(0x39) == 0x04
GPE 2 triggered
THERMT_L0_OSB (GEVENT2)
This indicates a thermal event triggered a GPIO interrupt.
It is a fatal condition on Pentium III nodes, and the node
will be immediately taken out of the cluster with this
fatal error.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x04
GPE 2 triggered
P0_PROC_HOT (GEVENT2)
The Pentium 4 CPU supports clock modulation which reduces
the
core frequency when the core temperature is too high. The
BIOS
enables this support when starting the OS, so after the node
has joined the cluster, the BIOS will asynchronously notify
the
OS if this event occurs but not take it out of the cluster.
At
the same time, the Pentium 4 processor will automatically
reduce its clock speed so as to generate less heat and not
reach a shutdown temperature. This message is therefore not
fatal on P4 CPUs.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[2]: PCI0_PERR_L
This error indicates either the PLX #0 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #0 detected a parity error.
These
components manage PCI slots 4, and 5 on T-Series and Slot 2
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[2]: PEX0_FATAL_ERROR
Table Continued
128
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
This error indicates that the PLX #0 PCIe-PCIe bridge
detected
a fatal error.These components manage PCI slots 6, 7, and 8;
Harriers 1 and 2 LPC2.
See Code 37, sub-code 0x1 for resolution information.
Chimera nodes:
*** Error: GPE[2]: PEX0_FATAL_ERROR
This error indicates that the PLX 8796 #0 or #1 PCIe-PCIe
bridge detected
a fatal error.These components manage PCI slots 0, 1, 5 and
6;
Harrier 0, LPC0 and LPC2; and Harrier 1, LPC0 and LPC2.
See Code 37, sub-code 0x1 for resolution information.
Eos, Tornado, and Orion nodes:
*** Error: GPE[2]: PEX_FATAL_ERROR
This error indicates that the PLX PCIe-PCIe bridge detected
a fatal error.These components manage PCI slots 0, 1, and 2;
Harrier LPC0 and LPC2.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
129
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x8 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
--- SMI: smm_inb(0x39) == 0x08
GPE 3 triggered
THERMT_L1_OSB (GEVENT2)
This indicates a thermal event triggered a GPIO interrupt.
See Code 37, sub-code 0x2 for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x08
GPE 3 triggered
P1_PROC_HOT (GEVENT2)
This indicates a thermal event triggered a GPIO interrupt.
See Code 37, sub-code 0x2 for resolution information.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[3]: PCI0_SERR_L
This error indicates either the PLX #0 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #0 detected a fatal error
(SERR).
These components manage PCI slots 4, and 5 on T-Series and
Slot 2
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[3]: PEX1_FATAL_ERROR
This error indicates that the PLX #1 PCIe-PCIe bridge
detected
a fatal error.These components manage PCI slots 3, 4, and 5;
Harrier 1 and 2 LPC1.
See Code 37, sub-code 0x1 for resolution information.
Chimera nodes:
*** Error: GPE[3]: PEX1_FATAL_ERROR
This error indicates that the PLX 8750 PCIe-PCIe bridge
detected
a fatal error. This component manages PCI slots 2, 3, and 4;
Harrier 0, LPC1; and Harrier 1 LPC1.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
130
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x10 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
GPE 4 triggered
MIRQ (GEVENT4)
This error indicates the memory controller (CNB20HE)
triggered
an interrupt. The CNB20HE documentation lists possible
sources
as correctable ECC error on Memory data bus and Processor
data bus.
See below (P4) for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x10
GPE 4 triggered
P0_IERR (GEVENT4)
This error indicates that P4 CPU 0 has asserted IERR#,
which is
used to indicate a processor internal error event occurred.
The Intel documentation indicates one cause of this error
is a
machine check exception when exceptions have not yet been
enabled. From our experience in the field, the problem is
possibly a CPU or node motherboard failure.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove any recently installed PCI cards.
D) Remove all PCI cards.
E) Replace the node motherboard.
Diagnostic: A) Replace CPUs.
B) Replace CPU VRMs.
C) Check DIMM termination (R5836 etc on P4-Eagle nodes).
T-Series and F-Series (5000P) nodes:
*** Error: GPE[4]: PCI1_PERR_L
This error indicates either the PLX #1 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #1 detected a parity error.
These
components manage PCI slots 2, and 3 on T-Series and Slot 1
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[4]: FPGA_LPC_IRQ0_L
This error indicates an internal error. This should not
occur
in a V-Series system.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
131
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x20 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x20
GPE 5 triggered
P1_IERR (GEVENT5)
This error indicates that P4 CPU 1 has asserted IERR#.
See Code 37, sub-code 0x10 (P4) for resolution information.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[5]: PCI1_SERR_L
This error indicates either the PLX #1 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #1 detected a fatal error
(SERR).
These components manage PCI slots 2, and 3 on T-Series and
Slot 1
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P), Eos, Tornado and Chimera
nodes:
*** Error: GPE[4]: FPGA_LPC_IRQ1_L
This error indicates that NEMOE raised the FPGA SMI
interrupt
and it was not handled properly.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
132
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x40 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x40
GPE 6 triggered
P_SERR (GEVENT6)
This error indicates one or more of the system's chipset is
asserting P_SERR (primary side system error). Output is
usually followed by outstanding PCI errors as indicated by
chipset devices.
Resolution: A) Identify and replace failing PCI card based
on
error output. It may be necessary to contact
hardware engineering with BIOS output to determine
which PCI slot is at fault.
B) Remove all PCI cards.
C) Replace the node motherboard.
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[6]: MCH Uncorrectable Error
This error indicates the MCH (North Bridge) has detected an
uncorrectable error. Most likely there are other error
messages
present in the idelog to help pinpoint the issue. Since
the MCH
is the top of the root complex, it's very common to see the
MCH
indicating Uncorrectable error on nearly all failures.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs if no other error is indicated.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
133
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x80 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x80
GPE 7 triggered
P_PERR (GEVENT7)
This error indicates one or more of the system's chipset is
asserting P_PERR (primary side parity error).
See Code 37, sub-code 0x40 for resolution information.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[7]: PCI2_SERR_L
This error indicates either the PLX #2 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #2 detected a fatal error
(SERR).
These components manage PCI slots 0, and 1 on T-Series and
Slot 0
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[7]: Not connected
This error indicates an internal error. This should not
occur
in a V-Series system.
See Code 37, sub-code 0x1 for resolution information.
Eos, Tornado, and Chimera nodes:
*** Error: GPE[7]: MCH Fatal Error
This error indicates the MCH (North Bridge) has detected a
fatal
condition.Most likely there are other error messages present
in the idelog to help pinpoint the issue. Since the MCH is
the
top of the root complex, it's very common to see the MCH
indicating
Fatal error on nearly all failures.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs if no other error is indicated.
C) Replace the node motherboard.
Table Continued
134
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x100 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
--- SMI: smm_inb(0x3a) == 0x01
GPE 8 triggered
CPU_TEMP_INTR (GEVENT8)
This indicates a CPU temperature event triggered a GPIO
interrupt.
See Code 37, sub-code 0x2 for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x3a) == 0x01
GPE 8 triggered
S_SERR (GEVENT8)
This error indicates one or more of the system's chipset is
asserting S_SERR (secondary side system error).
See Code 37, sub-code 0x40 for resolution information.
T-Series and F-Series (5000P) nodes:
--- SMI request via EXT_SMI
This error indicates another node in the cluster has forced
this node to handle an SMI. Most likely the other node is
attempting to force a panic dump because the local node has
stopped responding.
Resolution: A) Inspect the core dump to determine if the
cause was a software or hardware failure.
B) Replace the node motherboard if the issue
recurs and cannot be identified as a software
failure.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[7]: Not connected
This error indicates an internal error. This should not
occur
in a V-Series system.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
135
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x200 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
This error indicates one or more of the system's chipset is
asserting SERR (system error).Output is followed by the
PCI scan results, which displays outstanding PCI errors of
all PCI bus devices.
See below (P4) for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x3a) == 0x02
GPE 9 triggered
S_PERR (GEVENT8)
This error indicates one or more of the system's chipset is
asserting S_PERR (secondary side parity error).
Resolution: A) Identify and replace failing PCI card based
on
error output. It may be necessary to contact
hardware engineering with BIOS output to determine
which PCI slot is at fault.
B) Remove all PCI cards.
C) Replace the node motherboard.
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[9]: CPU0 IERR_L
This error indicates that CPU 0 has asserted IERR#, which is
used to indicate a processor internal error event occurred.
The Intel documentation indicates one cause of this error
is a
machine check exception when exceptions have not yet been
enabled. From our experience in the field, the problem is
possibly a CPU or node motherboard failure.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove all PCI cards.
D) Replace the node motherboard.
Diagnostic: A) Replace CPUs.
B) Replace CPU VRMs.
Table Continued
136
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 37, subcode 0x400 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[10]: CPU1 IERR_L
This error indicates that CPU 1 has asserted IERR#, which is
used to indicate a processor internal error event occurred.
See Code 37, sub-code 0x200 for resolution information.
Chimera nodes:
*** Error: GPE[10]: CPU1_THERMTRIP_L
This indicates a thermal event on CPU1 triggered a GPIO
interrupt.
It is a fatal condition and the node will be immediately
taken out of the cluster with this fatal error.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
Fatal error: Code 37, subcode 0x800 (0)
GEVENT_TRIGGERED"GEvent Triggered"
Chimera nodes:
*** Error: GPE[11]: CPU0_THERMTRIP_L
This indicates a thermal event on CPU0 triggered a GPIO
interrupt.
It is a fatal condition and the node will be immediately
taken out of the cluster with this fatal error.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
Eos and Tornado nodes:
*** Error: GPE[11]: THERMTRIP_L
This indicates a thermal event on the CPU triggered a GPIO
interrupt.
See the above information regarding Chimera nodes for
resolution.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
137
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 37, subcode 0x2000 (0)
GEVENT_TRIGGERED"GEvent Triggered"
Eos, Tornado and Chimera nodes:
*** Error: GPE[13]: CAT_ERR_L
This error indicates that a CPU has asserted IERR#, which is
used to indicate a processor internal error event occurred.
The Intel documentation indicates one cause of this error
is a
machine check exception when exceptions have not yet been
enabled. From our experience in the field, the problem is
possibly a CPU or node motherboard failure.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove all PCI cards.
D) Replace the node motherboard.
Diagnostic: A) Replace CPUs.
B) Replace CPU VRMs.
Non-fatal error: Code 38,
sub-code 0x0 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power Supply xx indicates invalid battery configuration: y
batteries
Verify battery connection and individual battery units.
The maximum count of batteries in a string which are
supported
by software is 3. Any greater number will result in this
non-fatal error.
The data value may be decoded to determine which power
supply
and the battery count.The high 8 bits are a bitmask of the
power supply. The lower 16 bits are the number of batteries
counted. Thus, a data value of 100000c indicates PS1 had a
battery count of 12. A data value of 4 indicates PS0 had a
battery count of 4.
Resolution: A) Verify no more than 3 batteries in a string
are connected to any one power supply.
B) Cycle power on the node.
C) Remove batteries one at a time to determine
if there is a faulty connection or battery.
Replace the faulty cable or battery.
D) Replace the power supply.
E) Replace the node motherboard.
Table Continued
138
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 38,
sub-code 0x1 (0)
POWER_SUPPLY_FAILURE"Power Supply Failure"
RTC / NVRAM Battery Failure - Replace battery.
The RTC / NVRAM battery was found to have a low voltage by
the
built-in monitoring circuit of the RTC (TOD clock).
Resolution: A) Replace the lithium-ion cell battery on the
node.
B) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x3 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
No batteries present on power supply xx
This error indicates no batteries were found on a node
power supply.
This warning may be enabled by setting "warn_nobat" in
NVRAM.
The data value may be decoded to determine which power
supply
triggered this error. The high 8 bits are a bitmask of the
power supply. Thus, a data value of 0 indicates PS0 is not
present. A data value of 1000000 indicates PS1 is not
present.
Resolution: A) Verify there is at least one battery
connected.
B) Cycle power on the node.
C) Exchange cables and batteries.
D) Replace the power supply.
E) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x4 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply missing: node power configuration is not
redundant
This error indicates one of the two power supplies for
a node is not present.This warning may be enabled by
setting "warn_ps" in NVRAM.
The data value may be decoded to determine which power
supply
triggered this error. The high 8 bits are a bitmask of the
power supply. Thus, a data value of 0 indicates PS0 is not
present. A data value of 1000000 indicates PS1 is not
present.
Resolution: A) Verify both power supplies are present and
powered on.
B) Power off the missing supply, remove it, and
re-insert it in the chassis.
C) Replace the power supply.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
139
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 38,
sub-code 0x5 (0)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Battery failure on Power Supply
This error indicates that a battery on the power supply
has reported a hardware error.The status light on the back
of the failed battery will be amber.
Resolution: A) Verify both power supplies are present and
powered on.Verify batteries are present and
powered on.
B) Power off the failed battery, remove the cable, and
re-insert it in the Power Supply. Turn it back on.
If that does not reset the FAILED condition,
replace the battery.
C) Replace the power supply.
D) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x6 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Powering off PSxx because it is on battery power.
This will shut down the node until AC is restored.
This message indicates that a power supply lost input AC
Power and
that the BIOS powered down the node to avoid draining the
battery.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
DC power
status. Bit 0 represents power supply 0 and Bit 1
represents power
supply 1. If this bit is 1, then the DC output from the
power
supply was good when the system shut down.
Resolution: A) Apply AC power to the node.
B) Replace the power supply.
Table Continued
140
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 38,
sub-code 0x7 (data)
Description
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: Fan Bad
or
Power supply xx failure: Fan 0 Bad
or
Power supply xx failure: Fan 1 Bad
This error indicates there is a hardware problem in one of
the
node power supplies. One or more of the fans may have
failed.
The data value may be decoded to determine which power
supply
(and fan) triggered this error. The low 2 bits are a
bitmask of
the fan status for Power Supply 0.The next 2 bits are a
bitmask
of the fan status for Power Supply 1. Thus:
1: PS0 had a Fan0 failure 2: PS0 had a Fan1 failure
3: PS0 had a double fan failure c: PS1 had a double fan
failure
4: PS1 had a Fan0 failure 8: PS1 had a Fan 1 failure
Resolution: A) Replace the power supply.
B) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x8 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: Charger Overload
This error indicates there is a hardware problem in one of
the
node power supplies, specifically that the charger cannot
handle
the battery charge current draw. If you need to override
this
error so the node continues, you can set
"ignore_chargefail" in
NVRAM.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
charger status for the two power supplies.This a value of 1
indicates PS0 had a charger overload. A value of 2
indicates
PS1 had a charger overload. A value of 3 indicates PS0 and
PS1
both had a charger overload.
Resolution: A) Check battery connection.
B) Exchange cables and batteries.
C) Replace the power supply.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
141
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 38, subcode 0x9 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Both Power Supplies failed: DC Output Bad
This error indicates there is a hardware problem in one of
the
node power supplies. If this failure is transient, it could
also
be caused by turning the power supply off and then on or by
a
quick AC loss followed by AC being restored. If both power
supplies fail simultaneously (not likely), this is a fatal
error.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
DC Output status for the two power supplies.
As a Fatal error, the value will be 3, indicating PS0 and
PS1
both had a DC Output Bad.
Resolution: A) Ensure a service operation was not taking
place
at the time, and that AC had not also failed.
B) Replace the power supply.
C) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x9 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: DC Output Bad
This error indicates there is a hardware problem in one of
the
node power supplies. If this failure is transient, it could
also
be caused by turning the power supply off and then on or by
a
quick AC loss followed by AC being restored. If both power
supplies fail simultaneously (not likely), this is a fatal
error.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
DC Output status for the two power supplies.
This a value of 1 indicates PS0 had a DC Output Bad.
A value of 2 indicates PS1 had a DC Output Bad.
Resolution: A) Ensure a service operation was not taking
place
at the time, and that AC had not also failed.
B) Replace the power supply.
C) Replace the node motherboard.
Table Continued
142
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 38,
sub-code 0xa (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: AC Input Bad
This error indicates that AC input power is not being
supplied to
one or more power supplies. The likely cause is either a
real AC
Failure or that the power supply has been switched to the
off
position. In the case of an AC Failure, the power supply
will be
automatically shut down to preserve batteries (if
"ignore_acfail" is
set then the power supply will not be shut down).
The lower 2 bits of the data value may be decoded to
determine
which power supply lost AC power. A value of 1 indicates
PS0.
A value of 2 indicates PS1. A value of 3 indicates both
power
supplies lost AC power.
Resolution: A) Verify AC power is present and the power
supply switch is turned on.
B) Check the Power Distribution Unit (PDU) breaker.
C) Replace the power supply.
D) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0xb (0)
POWER_SUPPLY_FAILURE"Power Supply Failure"
**** Power Supplies mismatch ****
Power Supply 0: I2C accessible
Power Supply 1: I2C inaccessible
This error indicates one of the power supplies is a new
style
(I2C interface) and the other power supply is not responding
using I2C, but has been detected as present. This is not a
supported configuration. If you need to override this
error,
set "ignore_psdiff" in NVRAM.
Resolution: A) Pull and re-insert the inaccessible power
supply.
B) Check the Power Distribution Unit (PDU) breaker
for the inaccessible power supply.
C) Replace the power supply.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
143
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 38,
sub-code 0xc (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
This error indicates Power Supply 0 reported a limit was
exceeded
while performing the power supply status test.Each power
supply
has integrated monitors for temperature, voltage, and
current
draw. The BIOS reads these sensors as part of
initialization to
determine if the power supply is operating within
specifications.
The data value may be decoded to determine the particular
cause
of the limit failure. Each bit represents a unique sensor.
Data
values may be decoded as follows:
00000001 - Temperature
00000004 - 3.3V
00000008 - 3.3V Current
00000010 - 5V
00000020 - 5V Current
00000040 - 12V
00000080 - 12V Current
00000100 - 24V
00000200 - 24V Current
00000400 - 48V
00000800 - 48V Current
00001000 - Bat0 48V
00002000 - Bat1 48V
00004000 - Bat2 48V
00008000 - Bat0 12V
00010000 - Undefined ... to ...
00400000 - Undefined
00800000 - Battery LED is Amber
01000000 - Battery Relay is Off
02000000 - PS LED is Amber
04000000 - Fan Fail
08000000 - DC Fail
10000000 - AC Fail
20000000 - Power Supply is Disabled
40000000 - Power Supply Switch is Off
80000000 - Low Limit exceeded (combined with bits above)
Resolution: Contact 3PAR technical support.
Non-fatal error: Code 38,
sub-code 0xd (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
This error indicates Power Supply 1 reported a limit was
exceeded
while performing the power supply status test.
See Code 38, sub-code 0xc for resolution information.
Table Continued
144
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 38,
sub-code 0xe (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Each newer generation (Magnetek) power supply and battery
has an
I2C interface which allows the node to acquire power supply
internal temperature, voltages, and current loads.The BIOS
will
verify these readings are within acceptable limits as part
of
normal initialization.
This failure code indicates a limit has been exceeded on a
battery
attached to a power supply on the node. The data value may
be
decoded to determine which power supply and battery. The
lower
2 bits are a bitmask of the power supply. The upper 16
bits are
a bitmask of the failing battery. Thus, a data value of
10002 indicates PS1 Bat0 has exceeded a limit.A data value
of
40001 indicates PS0 Bat2 has exceeded a limit.
Resolution: A) Check battery expiration date and replace as
necessary.
B) Power cycle the failing battery.
C) Replace battery cable.
Diagnostic: A) Use the Whack "bat status" command to display
power supply and battery temperatures and
voltages to determine the particular failure.
Non-fatal error: Code 38,
sub-code 0xf (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
I2C errors prevented completion of the power test.
Each newer generation (Magnetek) power supply and battery
has an
I2C interface which allows the node to acquire power supply
status.
This failure codes indicates the BIOS was unable to read
one of
the Power Supply or battery status registers.
The lower 2 bits of the data value may be decoded to
determine
which power supply failed.A value of 1 indicates PS0. A
value
of 2 indicates PS1. A value of 3 indicates both power
supplies
failed.
Resolution: A) Power cycle the indicated power supply.
B) Replace power supply.
C) Replace all attached batteries to the power supply.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
145
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 38,
sub-code 0x10 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
PSwwww Batxxxx Switch Off
This failure code indicates a battery has its power switch
in the
off position, and is thus unable to supply back up power to
the
node in the case of AC Failure. The data value may be
decoded to
determine which power supply and battery. See Code 38, subcode 0xd
for decoding information.
Resolution: A) Turn battery on.
B) Power cycle the indicated battery.
C) Replace battery cable.
D) Replace power supply.
Fatal error: Code 38, subcode 0x11 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
PS x has down-rev firmware (x)
This failure code indicates the power supply firmware
revision
is not up-to-date and therefore not supported.
Resolution: Replace power supply.
Fatal error: Code 38, subcode 0x12 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
PS x Battery has down-rev firmware (rev)
This failure code indicates the battery attached to the
power supply
indicated has firmware that is not up-to-date and therefore
not
supported.
Resolution: Replace battery.
Table Continued
146
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 39, subcode 0x1 (0)
Description
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for no successful OS boot (xxxx) exceeded.
Type "unset cnt_no_os_boot" to clear this error
This error indicates that the BIOS has detected that the
node has not successfully booted the OS and will now
prohibit boots until operator intervention clears this
error.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_no_os_boot" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Fatal error: Code 39, subcode 0x2 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for OS boot with no cluster (xxxx) exceeded.
Type "unset cnt_no_cluster" to clear this error
This error indicates that the BIOS has detected that the
node has booted, but the cluster has not successfully formed
several times.The BIOS will prohibit boots until operator
intervention clears this error. This is to prevent cyclic
node up/down caused by a hardware or software failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_no_cluster" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
147
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 39, subcode 0x3 (0)
Description
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for OS panic (xxxx) exceeded.
Type "unset cnt_os_panic" to clear this error
This error indicates that the BIOS has detected that the
node has booted and then caused a panic several times.
When the OS causes a panic, it notifies the BIOS of this
event, so the BIOS can track problems.Once a limit is
exceeded, the BIOS will prohibit boots until operator
intervention clears this error. This is to prevent cyclic
node up/down caused by a hardware or software failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_os_panic" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Fatal error: Code 39, subcode 0x4 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for OS cluster without shutdown (xxxx)
exceeded.
Type "unset cnt_no_shutdown" to clear this error
This error indicates that the BIOS has detected that the
node has booted, but has not been shut down properly
several times.The BIOS will prohibit boots until operator
intervention clears this error. This is to prevent cyclic
node up/down caused by a hardware or software failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_no_shutdown" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Table Continued
148
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 39, subcode 0x5 (0)
Description
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for same fatal error (xxxx) exceeded.
Type "unset cnt_same_fatal" to clear this error
This error indicates that the BIOS has detected that the
same fatal or non-fatal error has occurred repeatedly.
The BIOS will prohibit boots until operator intervention
clears this error.This is to prevent cyclic node up/down
caused by a hardware or software failure. This increases
the reliability of the cluster by preventing the node from
continuously attempting to join the cluster.
Resolution: A) Observe other errors present in the PROM log
to determine the cause of this error.
B) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_same_fatal" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Fatal error: Code 39, subcode 0x6 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for errors logged (xxxx) exceeded.
Type "unset cnt_log_error" to clear this error
This error indicates that the BIOS has detected that it
has recorded too many fatal or non-fatal errors in the board
serial PROM and that it should prohibit further boots until
operator intervention clears this error. This is to prevent
cyclic node up/down caused by a hardware or software
failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Observe other errors present in the PROM log
to determine the cause of this error.
B) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_log_error" at
a Whack prompt.
C) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
D) Replace the IDE drive.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
149
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 39, subcode 0x7 (0)
OS_STARTUP_FAILURE "OS Startup Error"
This node hit the Harrier Mismatch Error, failed ECC
This error indicates that the BIOS has detected that the
Harrier
ASIC's ECC logic has hit an error.It should have triggered
an
ECC error, but failed to do so.
Resolution: Replace the node motherboard.
Fatal error: Code 39, subcode 0x10 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Invalid boot sector. Use "boot net install" to correct
this.
The IDE disk is used for booting the operating system.
This error indicates the boot sector which has been
loaded from the disk does not have a valid signature.
The most likely cause of this error is that a fresh
IDE drive has been installed in the node and it needs
to be field net installed.
Disk MBR does not have a valid partition table
You may also see the above line immediately following
the fatal error. This message indicates the partition
table in the boot sector (Master Boot Record) was
also invalid, and that a "ide log" entry could not be
written.
Resolution: A) If no hardware has been replaced, first
try cycling power on the node.
B) Perform a field IDE net install on the
drive, or use "boot net install".
C) Use the "ide smart status" to acquire the drive
SMART status. Replace the IDE drive if a
failure is reported.
C) Replace the IDE cable.
D) Replace the IDE drive.
E) Replace the node motherboard.
Table Continued
150
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Non-fatal error: Code 40,
sub-code 0x1 (0)
CBIOS_OS_TIMEOUT"CBIOS to OS timeout"
*** Error: CBIOS to OS message communication timeout
During CPU SMI initialization, the queue facility to send
messages between the BIOS and TPD is tested. If there is
a problem triggering an SMI, or some other error which
causes message corruption, this error will result.This
error is recoverable because the OS can still come up and
function at a degraded level even if the communication
between the OS and BIOS is not functioning.
Resolution: A) View prom log to see if this is repeatable.
If not, ignore a single occurrence.
B) Cycle power on the node.
C) Replace the bootstrap CPU.
D) Replace the node motherboard.
Fatal error: Code 41, subcode 0x0 (0)
CPU_BUS_SPEED_BAD "CPU Bus Speed Bad"
*** Error: CPU speed is too slow.
The computed CPU speed is lower than the expected minimum
supported in a 3PAR node. Most likely this is due to a
hardware failure. Since the CPU speed computation depends
upon access to the RTC, it is most likely there is a
communication problem with the SuperIO containing the RTC.
If you need to run with a reduced CPU speed, enter the
following command on the node:
Whack> set perm cpu_slow_ok
See Code 41, sub-code 0x0 for resolution information.
Fatal error: Code 41, subcode 0x1 (0)
CPU_BUS_SPEED_BAD "CPU Bus Speed Bad"
*** Error: Memory speed is too slow.
After the CPU speed is computed, the memory bus (FSB)
speed is computed.It is computed based on the CPU
speed, and bus speed multiplier as reported by the CPU.
If you need to run with a reduced Memory bus speed, enter
the following command on the node:
Whack> set perm mem_slow_ok
Resolution: A) Cycle power on the node.
B) Replace the bootstrap CPU.
C) Replace the node motherboard.
Diagnostic: A) Resume past fatal error and look for
additional problems such as RTC failure.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
151
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 42, subcode 0x1 (0)
Description
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed CP PROM ww.xx.yy.zz read
Centerpanel access using Manufacturing PROM: FAILURE
The centerpanel is used by the 3PAR cluster for the nodes to
communicate. The CM links and backup serial links serve
this purpose. There is also a diagnostic I2C bus present in
the centerpanel which is used by nodes to diagnose error
conditions and reset other nodes in the cluster.
As part of the manufacturing process, this bus is tested by
accessing the serial PROM which is present on a
manufacturing
centerpanel. If this test fails, it is likely the node will
have a problem accessing the centerpanel I2C bus.
Resolution: A)
B) Replace the
Diagnostic: A)
such as the
Fatal error: Code 42, subcode 0x2 (0)
Cycle power on the node.
node motherboard.
Use the Whack "i2c" command to access devices
board register directly.
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed CP PROM ww.xx.yy.zz write
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Fatal error: Code 42, subcode 0x3 (0)
CP_I2C_FAILURE "Centerpanel I2C Failure"
CP PROM node data does not match what is written: Addr xxxx
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Fatal error: Code 42, subcode 0x4 (0)
CP_I2C_FAILURE "Centerpanel I2C Failure"
CP PROM pattern data read is incorrect
Addr xx Expected yy
Read zz
...
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Fatal error: Code 42, subcode 0x5 (0)
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed I2C access to board register x.y.z
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Table Continued
152
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 42, subcode 0x6 (0)
Description
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed I2C access to board register x.y.z
Centerpanel access using Manufacturing PROM: FAILURE
Titan specific. It does read accessibility check for extra
I2C
addresses while testing CP PROM 0.a0 and fails with fatal
error
message if the address is not accessible.
Note that if the failure is not related to CP PROM 0.a0, it
will
not print "CP PROM at 0.a0:" message and
only "Failed I2C access to board register x.yy".
See Code 42, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
153
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 43, subcode 0x0 (data)
Description
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Voltage ID indicates CPUxx present but TEMP sensor
disagrees.
This error indicates either a CPU failure or onboard sensors
are reading incorrect values for the specified CPU.
The VID (voltage ID sense) lines are attached to each
physical
CPU and used to indicate to the VRMs (voltage regulator
modules)
the voltage level expected by the CPU.These lines are also
connected to the LM87 which use this to determine the
correct
voltage which should be delivered to the CPU.
The TEMP (temperature) sensor is connected to an on-die CPU
thermal diode.If its reading is out of acceptable range,
the BIOS determines the sensor is not reliably connected to
a CPU, or a CPU is not present.
Bits 0-1 of data indicate CPU non-presence as determined
by the VID sense lines. Bits 8-9 of data indicate CPU nonpresence as determined by connection to the thermal diode.
Data Value Failure
------------------------------------------------------------1 CPU0 does not respond to startup
2 CPU1 does not respond to startup
10 CPU0 thermal sensor/voltage ID indicates not present
20 CPU1 thermal sensor/voltage ID indicates not present
Resolution: A) Cycle power on the node.
B) Remove physical CPU from specific socket and
test with no CPU present.
B1) If error persists, replace node motherboard.
B2) If error clears, replace CPU.
C) Replace the node motherboard.
Diagnostic: A) Use "i2c env" command to determine whether
the temperature or voltage is at fault.
B) If CPU temperature shows out of range, and
CPU is still functional, suspect thermal diode
connection to LM87.Try swapping CPUs to see
if problem moves with CPU.
C) If CPU voltage shows high or low, but VRM is
emitting correct voltage by the voltage sensor,
then suspect the VID lines to the LM87.
Fatal error: Code 43, subcode 0x1 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Voltage ID indicates CPUxx not present but TEMP sensor
disagrees.
See Code 43, sub-code 0x0 for resolution information.
Table Continued
154
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 43, subcode 0x2 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Physical CPUxx active, but thermal sensor disagrees
Bits 0-1 of data indicate CPU non-presence as determined
by the running CPU APIC addresses.Bits 8-9 of data
indicate CPU non-presence as determined by connection to
the thermal diode.
See Code 43, sub-code 0x0 for resolution information.
Fatal error: Code 43, subcode 0x3 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Physical CPUxx not active, but thermal sensor disagrees
Bits 0-1 of data indicate CPU non-presence as determined
by the running CPU APIC addresses.Bits 8-9 of data
indicate CPU non-presence as determined by connection to
the thermal diode.
See Code 43, sub-code 0x0 for resolution information.
Fatal error: Code 43, subcode 0x4 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Not all hyper-threads started on physical CPUxx
Bits 0-1 of data
physical CPU0 as
addresses.
Bits 2-3 of data
physical CPU1 as
addresses.
indicate logical CPU non-presence in
determined by the running CPU APIC
indicate logical CPU non-presence in
determined by the running CPU APIC
See Code 43, sub-code 0x0 for resolution information.
Fatal error: Code 43, subcode 0x5 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Not all cores started on physical CPUxx
Bits 0-3 of data
physical CPU0 as
addresses.
Bits 4-7 of data
physical CPU1 as
addresses.
indicate logical CPU non-presence in
determined by the running CPU APIC
indicate logical CPU non-presence in
determined by the running CPU APIC
See Code 43, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
155
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 43, subcode 0x10 (xx)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
CMIC heatsink disconnected: yy
The GPIOs reporting proper connection of the CMIC (North
Bridge) heatsink report a loss of connection. This is a
board failure which requires a lab technician to reattach
the heatsink.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Visually inspect the CMIC heatsink posts to
determine if it needs to be reattached.
B) Observe the reported xx value to see if it is
one or both GPIOs which report the failure.
These lines may be traced from VSC055 GPIOs
P3.1 (J1300) and P3.2 (J1301).
C) The BIOS flag "ignore_hsfail" may be set to
override checking the CMIC heatsink.
Non-fatal error: Code 44,
sub-code 0x00 (xx)
NODE_FAN_FAILURE"System Fan Failure"
*** Error: One of the node fans is not present, failed,
or is unintentionally running at a slower speed than
expected.
The VSC055 reports tachometer inputs for both node fans,
0 and 1. This is a single node fan failure which requires
the fan to be replaced.
Resolution: A) Cycle power on the node.
B) Replace the node fan.
Diagnostic: A) Visually inspect the node fan.
B) Observe the fan is present and connected properly.
C) If it was misconnected, correct the connection.
Otherwise, the fan needs to be replaced.
Fatal error: Code 44, subcode 0x01 (xx)
NODE_FAN_FAILURE"System Fan Failure"
*** Error: Both of the node fans are not present, failed,
or are unintentionally running at speeds slower than
expected.
The VSC055 reports tachometer inputs for both node fans,
0 and 1. This is a dual node fan failure which requires
both of the fans to be replaced. The system may overheat.
Resolution: A) Cycle power on both nodes.
B) Replace both node fans.
Diagnostic: A) Visually inspect the node fans.
B) Observe the fans are present and connected properly.
C) If they are misconnected, correct the connections.
Otherwise, the fans need to be replaced.
Table Continued
156
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 45, subcode 0x00 (data)
QLIS_ISCSI_FAILURE "QLogic iSCSI Failure"
*** Error: QLogic iSCSI Failure
This error code indicates an error while running the QLogic
iSCSI POST.
Failed Test (bits 8-15), Slot (bits 4-7) and Port (bits
0-3) are
packed into data.
Failed Test is one of the following:
<QLogic internal card diagnostics>
2
3
4
5
6
7
8
9
240
241
242
243
244
Test Local RAM Size
Test Local RAM R/W
Test RISC RAM
Test NVRAM
Test Flash ROM
Test Network Internal Loopback
Test Network External Loopback
Test DMA Transfer
(0xf0) Test NOP
(0xf1) Test Registers
(0xf2) Test DMA Transfer to CPU memory
(0xf3) Test DMA Transfer to Cluster memory
(0xf4) Card Initialization
Resolution: A) Cycle power on failing node.
B) Re-seat failing iSCSI card
C) Replace failing iSCSI card
Fatal error: Code 46, subcode 0x1 (0)
BAD_OR_UNKNOWN_CHIPSET
"Bad or Unknown Chipset"
*** Error: Unrecognized chipset (0xXXXXXXXX).
This error code indicates CBIOS does not recognize
the chipset installed on the node's motherboard.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Non-fatal error: Code 46,
sub-code 0x2 (0)
BAD_OR_UNKNOWN_CHIPSET
"Bad or Unknown Chipset"
*** ME not in operational mode. IPMI data unavailable.
This error code indicates that the PCH Management Engine is
not
in the desired operational mode in PCH chipset. IPMI
temperature
data is not available in this mode and the systems fans may
not
run at the proper speed and may not cool the enclosure.
Resolution: Contact engineering with data.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
157
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 47, subcode 0x00 (0)
SDRAM_UNAVAILABLE"Control Cache Unavailable"
No CPU SDRAM is available.
This error indicates that CBIOS has no working
CPU memory available for it to continue with POST and
ultimately boot the node.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs.
C) Replace the node motherboard.
Fatal error: Code 48, subcode 0x0 (XXXXXXXX)
UNKNOWN_BOARD "Unknown Board"
*** Error: Unrecognized board identifier (0xXXXXXXXX).
This error code indicates CBIOS does not recognize the
board type
for the chipset installed on the node's motherboard.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Fatal error: Code 49, subcode 0x1 (data)
USB_FAILURE "USB Flash Media Failure"
Failed to find USB device handle
or
Inquiry Request Failed rc = xxxx
The USB controller failed to perform a self test.
A data value of 0 indicates the BIOS failed to find a USB
handle.
Resolution: A) If a USB Flash drive is not expected to be
present,
set the "usb_nodevice_ok" NVRAM variable to override
BIOS requiring a USB Flash drive be found.
B) Replace the USB Flash drive.
C) Replace the node motherboard.
Diagnostic: A) Whack "usb test" commands may be used to
individually execute USB tests.
Fatal error: Code 49, subcode 0x4 (0)
USB_FAILURE "USB Flash Media Failure"
There was a USB failure in data requested by the operating
system bootstrap. It is possible that data on the disk has
become corrupt to the point the operating system will not
successfully load.
Resolution: Reinstall the operating system bootstrap with
the "boot net install" command.
Table Continued
158
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 49, subcode 0x6 (0)
USB_FAILURE "USB Flash Media Failure"
USB reported a failure in the read verify command.
See Code 49, sub-code 0x1 for resolution information.
Fatal error: Code 49, subcode 0x7 (0)
USB_FAILURE "USB Flash Media Failure"
USB reported a failure in the write verify command.
See Code 49, sub-code 0x1 for resolution information.
Non-fatal error: Code 49,
sub-code 0x17 (0)
USB_FAILURE "USB Flash Media Failure"
No USB device was found.
Resolution: Install or replace the USB Flash drive.
Fatal error: Code 50, subcode 0x1 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Invalid control cache setup.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x2 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Incompatible FB-DIMM installed.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x3 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Electrically isolated FB-DIMM.
Resolution: A) Replace DIMM.
B) Replace node.
Fatal error: Code 50, subcode 0x4 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Incompactible module installed.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x5 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Mismatched DIMM pair.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x6 (<DIMM>)
SDRAM_INIT_WARNING
Odd rank disabled.
"Control Cache Init Failure"
Resolution: Replace DIMM.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
159
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 50, subcode 0x7 (0)
Description
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM branch failed to train and lockstep mode has been
disabled.
Resolution: A) Replace all DIMMs.
B) Replace node.
Fatal error: Code 50, subcode 0x9 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM northbound merge has been disabled.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0xa (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM disabled due to lockstep skew.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0xb (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM rank disabled due to Built-in Self Test failure.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0xe (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Memory interleave range limit invalid.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0xf (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
High temp disabled.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x10 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Logical rank with CECC detected.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x12 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Sub-optimal FB-DIMM channel population detected.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x13 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Mismatched AMB pair.
Resolution: Replace all DIMMs.
Table Continued
160
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 50, subcode 0x14 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM branch disabled.
Resolution: A) Replace all DIMMs.
B) Replace node.
Fatal error: Code 50, subcode 0x15 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM thermal throttling has been disabled.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x16 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Last FB-DIMM AMB has been disabled.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x17 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
The FB-DIMM memory branches do not match in size.
Resolution: Contact 3PAR technical support.
Fatal error: Code 51, subcode 0x1 (Data)
CMA_BIST_FAILURE "CM ASIC Cache BIST Failure"
The BIST (Built-in Self Test) in Harrier reported either a
BAD
value or a different value from what was recorded in the
node PROM
during MFG board assembly. (Data = Harrier BIST result)
Resolution: Replace the node. Note for OPS that Harrier
BIST
failed, and that the PROM should not be wiped.
Non-fatal error: Code 51,
sub-code 0x2 (Data)
CMA_BIST_FAILURE "CM ASIC BIST Failure"
During Harrier initialization, the CMA BIST test failed but
due to
some other (e.g. I2C I/O error) reason. This error codes
indicates
that the BIST test itself hasn't failed but there was an
error
which occurred either during book-keeping (PROM0 read/
write) or the
test was not performed at all because it failed to read a
Harrier
register. (Data = 0x2f)
Resolution: Monitor and replace the node if the issue
recurs.
If the node is replaced, note for OPS that they should
verify I2C to the node PROM is functional.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
161
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 52,
sub-code 0x0
Description
CPU_PM_FAILURE"CPU Power Management Failure"
One or more bits in CPU's 2 General Power Management
registers
were set due to abnormal power reset. The set bits are
printed
describing the cause.
Resolution: Contact engineering with data.
Non-fatal error: Code 52,
sub-code 0x1
CPU_PM_FAILURE"CPU TSC is over 48-bits"
CPU Timestamp counter is too big after reset. There could
be CPU
reset issue.
Resolution: Contact engineering with data.
Fatal error: Code 53, subcode 0x0 (xxxx)
FPGA_FAILURE "FPGA Failure"
The CPU was unable to communicate with the FPGA.
Resolution: Replace node motherboard.
Fatal error: Code 53, subcode 0x1
FPGA_FAILURE "FPGA Failure"
FPGA revision in EOS node is old. FPGA upgrade is required.
Resolution: Upgrade FPGA to the latest revision.
Fatal error: Code 53, subcode 0x2
FPGA_FAILURE "FPGA Failure"
FPGA revision in Chimera node is not correct. FPGA upgrade
is required.
Resolution: Upgrade FPGA to the corect revision.
Fatal error: Code 54, subcode 0x0 (xxyy)
VRM_FAILURE "VRM Failure"
A CPU VRM is missing.
or
A CPU VRM is not providing power.
Resolution: A) Replace CPU VRM yy.
B) Replace node motherboard.
Fatal error: Code 55, subcode 0xzzzzzzzz (yyy)
UEFI_PEI_FAILURE "UEFI Failure: PEI"
UEFI failed to boot, failed during PEI due to assert.
Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3
tree
to determine filename of assert. yyy specifies line number
(in hex).
Resolution: Contact 3PAR technical support.
Table Continued
162
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 56, subcode 0xaabb (yyy)
Description
UEFI_MRC_FAILURE "UEFI Failure: Memory Training"
UEFI failed to boot, failed during Intel MRC memory
training code.
aa specifies the major code, bb specifies the minor code
* Code Table
* Format:
* 0xff Major code
* 0xff Minor code
0x30
Correctable error during MRC memory training
0x31
Uncorrectable error during MRC memory training
0x13
WARN_FPT_MINOR_RD_DQ_DQS
0x14
WARN_FPT_MINOR_RD_RCVEN
0x15
WARN_FPT_MINOR_WR_LEVEL
0x00
WARN_FPT_MINOR_WR_FLYBY
0x16
WARN_FPT_MINOR_WR_DQ_DQS
0x1b
WARN_FPT_MINOR_DQS_TEST
0x1c
WARN_FPT_MINOR_MEM_TEST
0x1d
WARN_FPT_MINOR_RCOMP_TIMEOUT
0xE8
ERR_NO_MEMORY
0x01
ERR_NO_MEMORY_MINOR_NO_MEMORY
0x02
ERR_NO_MEMORY_MINOR_ALL_CH_DISABLED
0x03
ERR_NO_MEMORY_MINOR_ALL_CH_DISABLED_MIXED
0xE9
ERR_LT_LOCK
0xEA
ERR_DDR_INIT
0x01
ERR_RD_DQ_DQS
0x02
ERR_RC_EN
0x03
ERR_WR_LEVEL
0x04
ERR_WR_DQ_DQS
0xEB
ERR_MEM_TEST
0x01
ERR_MEM_TEST_MINOR_SOFTWARE
0x02
ERR_MEM_TEST_MINOR_HARDWARE
0x03
ERR_MEM_TEST_MINOR_LOCKSTEP_MODE
0xEC
ERR_VENDOR_SPECIFIC
0xED
ERR_DIMM_COMPAT
0x01
ERR_MIXED_MEM_TYPE
0x02
ERR_INVALID_POP
0x03
ERR_INVALID_POP_MINOR_QR_AND_3RD_SLOT
0x04
ERR_INVALID_POP_MINOR_UDIMM_AND_3RD_SLOT
0x05
ERR_INVALID_POP_MINOR_UNSUPPORTED_VOLTAGE
0x06
ERR_MIXED_SPD_TYPE
0x07
ERR_NOT_SUPPORT_EXTENDED_ADDRESS
0XEE
ERR_MRC_COMPATIBILITY
0X01
ERR_MRC_DIR_NONECC
0xEF
ERR_MRC_STRUCT
0x01
ERR_INVALID_BOOT_MODE
0x02
ERR_INVALID_SUB_BOOT_MODE
0x01
ERR_INVALID_BOOT_MODE
0x02
ERR_INVALID_SUB_BOOT_MODE
* Warning Codes
0x01
WARN_RDIMM_ON_UDIMM
0x02
WARN_UDIMM_ON_RDIMM
0x03
WARN_SODIMM_ON_RDIMM
0x04
WARN_4Gb_FUSE
0x05
WARN_8Gb_FUSE
Table Continued
Error codes—HPE 3PAR OS 3.3.1
163
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
0x06
0x07
0x01
0x02
0x03
0x04
0x05
0x06
0x07
0x08
0x09
0x0A
0x09
0x01
0x02
0x03
0x0a
0x01
0x02
0x0b
0x0c
0x01
0x02
0x03
0x0d
0x0e
0x01
0x02
0x03
0x04
0x05
0x06
0x07
0x10
0x11
0x12
0x01
0x02
0x13
0x14
0x01
0x15
0x00
0x01
0x02
0x16
0x17
0x01
0x02
0x18
0x01
0x02
0x19
0x1a
0x1b
0x1c
WARN_IMC_DISABLED
WARN_DIMM_COMPAT
WARN_DIMM_COMPAT_MINOR_X16_C0MBO
WARN_DIMM_COMPAT_MINOR_MAX_RANKS
WARN_DIMM_COMPAT_MINOR_QR
WARN_DIMM_COMPAT_MINOR_NOT_SUPPORTED
WARN_RANK_NUM
WARN_TOO_SLOW
WARN_DIMM_COMPAT_MINOR_ROW_ADDR_ORDER
WARN_CHANNEL_CONFIG_NOT_SUPPORTED
WARN_CHANNEL_MIX_ECC_NONECC
WARN_DIMM_COMPAT_TRP_NOT_SUPPORTED
WARN_LOCKSTEP_DISABLE
WARN_LOCKSTEP_DISABLE_MINOR_RAS_MODE
WARN_LOCKSTEP_DISABLE_MINOR_MISMATCHED
WARN_LOCKSTEP_DISABLE_MINOR_MEMTEST_FAILED
WARN_USER_DIMM_DISABLE
WARN_USER_DIMM_DISABLE_QUAD_AND_3DPC
WARN_USER_DIMM_DISABLE_MEMTEST
WARN_MEMTEST_DIMM_DISABLE
WARN_MIRROR_DISABLE
WARN_MIRROR_DISABLE_MINOR_RAS_DISABLED
WARN_MIRROR_DISABLE_MINOR_MISMATCH
WARN_MIRROR_DISABLE_MINOR_MEMTEST
WARN_MEM_LIMIT
WARN_INTERLEAVE_FAILURE
WARN_SAD_RULES_EXCEEDED
WARN_TAD_RULES_EXCEEDED
WARN_RIR_RULES_EXCEEDED
WARN_TAD_OFFSET_NEGATIVE
WARN_TAD_LIMIT_ERROR
WARN_INTERLEAVE_3WAY
WARN_A7_MODE_AND_3WAY_CH_INTRLV
WARN_SPARE_DISABLE
WARN_PTRLSCRB_DISABLE
WARN_UNUSED_MEMORY
WARN_UNUSED_MEMORY_MINOR_MIRROR
WARN_UNUSED_MEMORY_MINOR_LOCKSTEP
WARN_RD_DQ_DQS
WARN_RD_RCVEN
WARN_ROUNDTRIP_EXCEEDED
WARN_WR_LEVEL
WARN_WR_FLYBY_CORR
WARN_WR_FLYBY_UNCORR
WARN_WR_FLYBY_DELAY
WARN_WR_DQ_DQS
WARN_DIMM_POP_RUL
WARN_DIMM_POP_RUL_MINOR_OUT_OF_ORDER
WARN_DIMM_POP_RUL_MINOR_INDEPENDENT_MODE
WARN_CLTT_DISABLE
WARN_CLTT_MINOR_NO_TEMP_SENSOR
WARN_CLTT_MINOR_CIRCUIT_TST_FAILED
WARN_THROT_INSUFFICIENT
WARN_CLTT_DIMM_UNKNOWN
WARN_DQS_TEST
WARN_MEM_TEST
Table Continued
164
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
0x1d
0x1e
0x20
0x21
0x22
0x23
0x24
0x01
0x25
0x01
WARN_CLOSED_PAGE_OVERRIDE
WARN_DIMM_VREF_NOT_PRESENT
WARN_LV_STD_DIMM_MIX
WARN_LV_2QR_DIMM
WARN_LV_3DPC
WARN_CMD_ADDR_PARITY_ERR
WARN_CMD_MARGINS
WARN_NO_EYE_FOUND
WARN_SMBUS_FAILURE
WARN_SMBUS_RD_FAILURE
Resolution: Contact 3PAR technical support.
Fatal error: Code 57, subcode 0xzzzzzzzz (yyy)
UEFI_DXE_FAILURE "UEFI Failure: DXE"
UEFI failed to boot, failed during DXE due to assert.
Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3
tree
to determine filename of assert. yyy specifies line number
(in hex).
Resolution: Contact 3PAR technical support.
Non-fatal error: Code 58,
sub-code 0x0
HECI_FAILURE "HECI Interface Failure"
CBIOS failed to obtain the ME firmware flash unlock code
through the HECI interface. This could prevent flash
commands
from functioning.
Resolution: Try rebooting the node.
Fatal error: Code 59, subcode 0x00 (0)
Non-fatal error: Code 59,
sub-code 0x01 (bbxxyyzz)
FAILSAFE_BIOS_BOOT"Failsafe Boot Halt"
The EOS Failsafe BIOS has booted without detecting a CRC
error in
the Main BIOS indicating a HW initialization failure
preventing the
node from booting.The Failsafe BIOS has also detected five
or
more non-CRC failures causing boots to failsafe within the
past
two hours and has stopped attempting to recover
automatically.
FAILSAFE_BIOS_BOOT"Failsafe Boot Mode"
The EOS Failsafe BIOS has booted. This non-fatal entry is
logged
to mark the switch over from the Main BIOS to the Failsafe
BIOS
which may be a different version.
The data field contains the build (bb) and version
(xx.yy.zz) of
the Failsafe BIOS that is booting.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
165
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 59,
sub-code 0x02 (flags)
Description
FAILSAFE_BIOS_BOOT"Failsafe Boot Mode"
The EOS Failsafe BIOS has booted. This non-fatal entry is
logged
to mark the switch over from the Main BIOS to the Failsafe
BIOS
which may be a different version.
The data field contains FPGA flags at the time of the boot.
Bits 0..7- FSBC_STAT register from the FPGA.
See FPGA design documentation for details.
Bits 8..15- FPGA Revision register.
Bits 16..23 - FPGA ID register. (=4 for EOS)
Bit 24- Flag indicating state of env var qa_force_bios_to
Bit 25- Flag indicating state of env var qa_force_fs_to
Flag: 1=var is set, 0=var is not set.
Bits 26..31 - Reserved, =0
Non-fatal error: Code 60,
sub-code 0x00 (0)
Fatal error: Code 60, subcode 0x01 (0)
Non-fatal error: Code 61,
sub-code 0x00 (data)
NEMOE_FAILURE "Nemoe Failure"
The OKI Nemoe MCU has failed to boot within the specified
timeout
and the UEFI BIOS has reset the chip. This non-fatal entry
is
logged to record the boot failure and attempted restart by
BIOS.
No recovery action is required for this subcode.
NEMOE_FAILURE "Nemoe Failure"
The OKI Nemoe MCU has failed to boot within the specified
timeout,
The BIOS had eset the part, and it still faied to complete
its
boot initialization before a timeout. The only corrective
action
is to replace the node.
AC_POWER_LOSS "AC Power Loss"
Turning off BBU because the node is on battery power.
This will shut down the node until AC is restored.
This message indicates that all power supplies lost input
AC Power
and that the BIOS powered down the node to avoid draining
the
battery.
The data value provides a mask of power supplies which have
AC good
input but failed DC output.
Resolution: A) Apply AC power to the node.
B) Replace the power supplies.
Table Continued
166
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 62, subcode 0x00 (path)
Description
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x00 indicates a general timeout or exhaustion of
available
retries during overall Write/Read/Gate leveling.
The "path" value encodes the cma number, channel number, and
chip-select map, according to the following bit range
mapping:
|31
24|2316|15 8|7
0|
| cma number | channel number |
| chip select map|
Resolution: A) Cycle power on the node.
B) Reseat CM memory riser card.
C) Reseat the failing Cluster memory DIMM.
D) Replace the failing Cluster memory DIMM.
E) Replace the node motherboard.
Fatal error: Code 62, subcode 0x01 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x01 indicates a timeout during write leveling, or
an
exhaustion of available retries during write leveling.
The "path" value encodes the cma number, channel number, and
chip-select number, according to the following bit range
mapping:
|31
24|2316|15 8|7
0|
| cma number | channel number |
|
chip select|
See Code 62, sub-code 0x00 for resolution information.
Fatal error: Code 62, subcode 0x02 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x02 indicates a timeout during read leveling, or
an
exhaustion of available retries during read leveling.
The "path" value encodes the cma number, channel number, and
chip-select number, according to the following bit range
mapping:
|31
24|2316|15 8|7
0|
| cma number | channel number |
|
chip select|
See Code 62, sub-code 0x00 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
167
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 62, subcode 0x03 (path)
Description
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x03 indicates a timeout writing to a Mosys PHY
register.
The "path" value encodes the channel number and the PHY CSR
address,
according to the following bit range mapping:
|3116|15 0|
| channel number
| CSR address
|
See Code 62, sub-code 0x00 for resolution information
Fatal error: Code 62, subcode 0x04 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x04 indicates a timeout reading from a Mosys PHY
register.
The "path" value encodes the channel number and the PHY CSR
address,
according to the following bit range mapping:
|3116|15 0|
| channel number
| CSR address
|
See Code 62, sub-code 0x00 for resolution information.
Fatal error: Code 63, subcode 0x00 (xxyy)
LRDIMM_COMM_FAILURE "LRDIMM Communication failure"
xx = SMBUS address
yy = SMBUS read failure status
This error indicates a failure during a SMBUS read of an
LRDIMM iMB register.
Resolution: A) Use the Whack command line to re-run the CM
initialzation test (cma init).
B) Use the Whack command line to Reset node.
C) Cycle power on the node.
D) Reseat appropriate Cluster Memory DIMM.
E) Replace appropriate Cluster Memory DIMM.
F) Replace the node motherboard.
Fatal error: Code 63, subcode 0x01 (xxyy)
LRDIMM_COMM_FAILURE "LRDIMM Communication failure"
xx = SMBUS address
yy = SMBUS write failure status
This error indicates a failure during a SMBUS write to an
LRDIMM iMB register.
Resolution: See Code 63 sub-code 0x00.
Table Continued
168
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 63, subcode 0x02 (xxxxyyzz)
Description
LRDIMM_COMM_FAILURE "LRDIMM iMB data mis-compare"
xxxx = iMB register
yy = Expected contents of register
zz = Actual contents of register
This is a Data MisCompare while verifying LRDIMM iMB
register initial values.
Resolution: See Code 63 sub-code 0x00.
Fatal error: Code 64, subcode 0x00 (FFFF)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR"
Could not find the variable string in the environment
variable table.
Resolution: A) User needs to fix illegal name in the table
or in call to table.
Fatal error: Code 64, subcode 0x00 (PortNum)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR"
PortNum specifies the invalid port number.
The user entered an incorrect RPC or LPC port number in the
"CMA PPHY..." command.
Resolution: A) User needs to enter the correct port number
for the "CMA PPHY..." command.
Non-fatal error: Code 64,
sub-code 0x01 (ack)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_ACK_OUT_TIMEOUT"
ack is the current state of the PPHY control register
acknowledge bit.
If ack is 0, the timeout occurred waiting for the ack bit
to assert. If ack is 1, the
timeout occurred waiting for the ack bit to deassert.
Resolution:
A) The "CMA PPHY..." commands are series of commands that
allow the user to
modify settings on PPHYs or run various tests.
Therefore, the user should
know what has changed and know whether or not any
failures are real or a
result of the changes that were made. If you feel the
hardware is bad,
continue,
B) Use the Whack command line to Reset node.
C) Cycle power on the node.
D) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
169
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 64,
sub-code 0x02 (data)
Description
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_MISMATCH"
data is the actual data value from the PPHY register that
did not match the expected value.
Resolution: A)See Code 64, sub-code 0x01 resolution.
Fatal error: Code 64, subcode 0x03 (PortNum)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_LBERT_ERROR"
PortNum is the RPC port being tested.
BERT is used to generate a pattern for the voltage margin
test and the test is expected to
generate a BERT error. This failure indicates the expected
error occurred but did not clear
or the expected error did not occur.
Non-fatal error: Code 64,
sub-code 0x04 (PortNum)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_PHASE_ERROR Total Eye
Margin value is over 1 UI"
PortNum is the RPC or LPC port being tested. UI is Unit
Interval.
Resolution: A) See Code 64, sub-code 0x01 resolution.
Non-fatal error: Code 65,
sub-code 0x00 (xxyy)
BOOT_DISK_WARNING "Boot disk warnings"
Booting with the default boot disk (xx) failed and the
default boot disk was
changed to next available boot disk (yy).
Resolution: Check boot disk (xx) for any disk failure.
Non-fatal error: Code 65,
sub-code 0x01 (xxyy)
BOOT_DISK_WARNING "Boot disk warnings"
Booting with the default boot disk (xx) failed and next
available
boot disk (yy) has failed booting.
Resolution: Check boot disk (xx) and (yy) for any disk
failure
Non-fatal error: Code 65,
sub-code 0x02 (0)
BOOT_DISK_WARNING "Boot disk warnings"
Reading boot disk info from PROM failed and dual boot disk
configuration was
skipped.
Resolution: Check PROM for any access failure.
Table Continued
170
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 66,
sub-code 0x02 (0)
Description
DIAGNOSTIC_FAILURE "Internal check of cached environment
failed"
Orion Platform specific. Internal development and test use
only.
On Orion nodes, the permanent environment is stored in a
slow device.
To speed up access, the environment is cached when the node
boots.
When the environment variable 'verify_cache_env' is set,
this cached
environment is verified against the permanent environment
on every
variable read. This code is logged when the two are found
out of synch.
Resolution: A code fix is required to correct this issue.
Report the
issue to engineering. Until a fix is available, 'set perm
no_cache_env'
to turn environment caching off and avoid this error.
Non-fatal error: Code 67,
NVDIMM_ERROR "NVDIMM error and status"
sub-code 0x01 (load check) Sub-code 0x01 (NVDIMM_E_LOAD_CHECK) logs the NVDIMM load
check result.
The "load check" value encodes the result of a battery selftest
performed by the NVDIMM hardware after coming out of reset.
This value is used to determine when the NVDIMM battery is
no longer
capable of supporting a backup.
This information can be collected for battery
characterization
Resolution: No action is necessary. Informational only.
Non-fatal error: Code 67,
sub-code 0x03 (result)
NVDIMM_ERROR "NVDIMM error and status"
Sub-code 0x03 (NVDIMM_E_SAVE_FAILED)
The "result" value encodes the specific backup error
reported by
the NVDIMM LAST EVENT RESULT register.
If "result" is 0x41, then the last save
If "result" is 0x42, then the last save
progress,
but failed to complete, implying NVDIMM
If "result" is 0x44, then the last save
incomplete.
operation timed out.
operation was in
battery power loss.
operation was
Resolution: A) Reseat appropriate Cluster Memory NVDIMM.
B) Reseat appropriate NVDIMM battery cable.
C) Replace appropriate Cluster Memory NVDIMM.
D) Replace appropriate NVDIMM battery.
E) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
171
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Non-fatal error: Code 67,
sub-code 0x04 (0)
Description
NVDIMM_ERROR "NVDIMM error and status"
Sub-code 0x04 (NVDIMM_E_NO_HPSSSD)
The expected NVDIMM (HPSSSD) was not detected when querying
the
I2C SPD device.
Resolution: A) Reseat appropriate Cluster Memory NVDIMM.
B) Replace appropriate Cluster Memory NVDIMM.
C) Replace the node motherboard.
Non-fatal error: Code 67
0x10 through 0xa0 (data)
NVDIMM_ERROR "NVDIMM error and status"
Sub-codes 0x10 through 0xa0 represent errors detected in
specific
NVDIMM initialization routines, as follows:
0x0010 //
0x0020 //
0x0030 //
0x0040 //
0x0050 //
0x0070 //
0x0080 //
0x0090 //
0x00a0 //
routine.
Error
Error
Error
Error
Error
Error
Error
Error
Error
in
in
in
in
in
in
in
in
in
harrier2_init_hcm_restore() routine.
nvdimm_init() routine.
nvdimm_wait_for_save() routine.
nvdimm_wait_for_ready_status() routine.
nvdimm_wait_for_main_code() routine.
nvdimm_auto_pic_fw_upgrade() routine.
nvdimm_flash_program_pic() routine.
nvdimm_get_current_version() routine.
nvdimm_flash_enter_program_state()
The upper 16 bits of the "data" value recorded with this
error
identify the specific location in the routine souce code
where
the error was detected. The lower 16 bits of the "data"
value
may also record function return codes, etc.
These error sub-codes are non-fatal, by default, but can be
made to
produce fatal errors by setting the nvdimm_enable_fatal
environment
variable.
Resolution: A) Reseat appropriate Cluster Memory NVDIMM.
B) Reseat appropriate NVDIMM battery cable.
C) Replace appropriate Cluster Memory NVDIMM.
D) Replace appropriate NVDIMM battery.
E) Replace the node motherboard.
Table Continued
172
Error codes—HPE 3PAR OS 3.3.1
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Status: Code 127
(STAT_BIOS_DIAG) "BIOS
Diag"
Status: Code 128
(STAT_BIOS_UPDATE)
"BIOS Update"
This code is not an error.It is a BIOS diagnostic failure
which
was forced by the Whack "fatal" command. It is used to
test the
error logging and reporting mechanisms of the BIOS and TPD
software.
This code is not an error.It indicates the BIOS determined
that it had been updated. During CBIOS initialization, it
looks at a value stored in NVRAM to determine if the current
version is newer than the version previously booted. If so,
the BIOS logs this update.The sub-code is the new BIOS
version and the minor code is the old BIOS version. Example:
Code 128 (BIOS update) - Subcode 0x10204 (10201)
The above indicates CBIOS was updated from version 1.2.1 to
1.2.4.
HPE 3PAR OS fatal error codes and error resolution—HPE
3PAR OS 3.3.1
Error codes above 255 are in the domain of the OS.
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
173
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Fatal error: Code 257,
sub-code yyyyyyyy (xx)
Description
PROM_EA_MEM_UERR
"Uncorrectable Cluster memory"
S-Series (PIII and P4) and E-Series (P4) nodes:
Log:
Uncorrectable Error in Cluster Memory
Text: UERR wwwwwwww at cluster DIMM xx, addr yyyyyyyy,
syn zzzzzzzz
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: eagle_err_interrupt() of eagleint.c
This error indicates the Cluster Manager ASIC (Eagle)
has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yyyyyyyy). The node is taken
out of the
cluster in response to this error.
T-Series, F-Series, and V-Series (5000P) nodes:
Log:
Uncorrectable Error in Cluster Memory
Text: CM UECC Error Status [wwwwwwww]:
osp: UECC: address=yy:yyyyyyyy chnl 0xww seg 0xqq synd
0xrr
bank=0xss col=0xtttt row=0xuuuu DIMMww.vv Multibit
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: eagle_err_interrupt() of eagleint.c
This error indicates the Cluster Manager ASIC (Osprey)
has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yy:yyyyyyyy). The node is taken
out of the
cluster in response to this error. If only the xx
value is
available, the DIMM number may be computed as (xx % 3).
(xx / 3).
For example: if xx is 2, this would refer to DIMM2.0.
Series based on the Harrier ASIC:
Log:
Uncorrectable Error in Cluster Memory
Text: HAR0|1 MemCore0|1 MUERR|UERR IntStatus=wwwwwwww
data xxxxxxxx:xxxxxxxx
denali channel addr y:yyyyyyyyy syndrome z
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: harrier_err_interrupt() of harrierint.c
This error indicates the Cluster Manager ASIC (Harrier)
has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yyyyyyyy). The node is taken
out of the
cluster in response to this error.
DIMM callout (xx) = 0 = DIMM0.0.0
Table Continued
174
Error codes—HPE 3PAR OS 3.3.1
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
1
2
3
4
5
6
7
=
=
=
=
=
=
=
DIMM0.1.0
DIMM0.0.1
DIMM0.1.1
DIMM1.0.0
DIMM1.1.0
DIMM1.0.1
DIMM1.1.1
Series based on the Harrier2 ASIC:
Log:
Uncorrectable Error in Cluster Memory
Text: HAR2 0|1 MemCore0|1 MUERR|UERR
IntStatus=wwwwwwww data xxxxxxxx:xxxxxxxx
DDR3 addr yy:yyyyyyyy syndrome z
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: harrier2_err_interrupt() of harrier2int.c
This error indicates the Cluster Manager ASIC
(Harrier2) has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yyyyyyyy). The node is taken
out of the
cluster in response to this error.
See DIMM callout for Harrier ASIC.
This event is usually followed by a core dump on disk.
The kernel log text in the core dump usually contains
some easy
to interpret text which identifies which DIMM has
failed.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM(s).
D) Replace the failing Cluster Memory DIMM(s).
E) Replace the node motherboard.
Diagnostic: A) Ensure BIOS tests are enabled using the
"table skip none" command at a Whack prompt.
B) Use "mem test cm" command to test cluster memory.
C) wwwwwwww is the CM Error interrupt status register
and the syndrome is zzzzzzzz. These may be decoded
using scaffold documentation.
Fatal error: Code 258,
sub-code xx (yy)
PROM_EA_MEM_CERR
"Correctable Cluster memory"
This error is not currently generated by a node. It is
a placeholder
should it be necessary to record correctable cluster
memory ECC errors
in the node PROM.
Table Continued
Error codes—HPE 3PAR OS 3.3.1
175
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 259,
sub-code xx (yy)
PROM_EA_XCB_ERR
"Error in the XCB engine"
This error is not currently generated by a node.
a placeholder
for CM XCB engine hardware errors.
Fatal error: Code 260,
sub-code xx (yy)
It is
PROM_EA_MEM_MUERR "Multiple uncorrectable memory"
Log:
Multiple Uncorrectable Error in Cluster Memory
Text: MUERR wwwwwwww at cluster DIMM xx, addr
yyyyyyyy, syn zzzzzzzz
Event: Multiple Uncorrectable Memory Error
Panic: Panic due to Multiple Uncorrectable Memory Error
Where: eagle_err_interrupt() of eagleint.c
This error indicates the Cluster Manager ASIC (Eagle or
Osprey) has
detected multiple uncorrectable memory errors in
cluster memory
DIMM (xx) at address (yyyyyyyy). The node is taken out
of the
cluster in response to this error.
See Code 257 for error resolution information.
Table Continued
176
Error codes—HPE 3PAR OS 3.3.1
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1
Code
Description
Fatal error: Code 261,
sub-code xx (yy)
PROM_EA_HW_ERR
"Cluster Manager HW error"
This error is not currently generated by a node. It is
a placeholder
for Cluster Manager (Eagle or Osprey) internal hardware
errors.
Fatal error: Code 262,
sub-code xxxxxxxx (yy)
PROM_EA_PCI_ERR
"Cluster Manager PCI error"
Log:
Cluster Manager PCI Error
Text: ea_pci_err: bus yy, status xxxxxxxx
Call CBIOS to analyze error.
...
Event: PCI bus yy error xxxxxxxx
Panic: Panic due to Eagle PCI error: bus yy, status
xxxxxxxx
Where: ea_pci_err() of eaint_hdler.c
This error indicates the Cluster Manager ASIC (Eagle or
Osprey) has
detected a PCI bus error while communicating with
either a CPU or
one of the PCI slot devices. This error is most likely
caused by a
card which has failed in one of the PCI slots. You may
need to
observe BIOS output which would be recorded in the
crash dump
in order to determine the true cause.
Resolution: A) Cycle power on the node.
B) Read BIOS output to determine if a specific PCI
card is implicated by the slot bridges. If so,
replace the card.
C) Replace the node motherboard.
Diagnostic: A) If BIOS messages indicate no other
device is at
fault, then manual BIOS tests may be performed to
determine if the cause is CIOB. You may use the
Whack "mem test" command with a CM memory range
to generate accesses. Use "eagle status" and
"eagle clear" to get and clear errors.
B) The "fibre test cluster" command is good to test
access from a fibre channel card to the CM.
Error codes—HPE 3PAR OS 3.3.1
177
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS
3.2.2
This table explains the codes, sub-codes, error code descriptions, and problem resolutions for the CBIOS
error codes.
When a BIOS initialization or diagnostic test fails such that the node cannot be allowed to continue
booting, a fatal error message is often displayed (sometimes with additional information). For each class
of error, a major Code is provided. A class-specific sub-code is also provided which gives the specific
failure condition.
NOTE:
A "GEvent" or "GPE" is a "GPIO (General Purpose I/O) event" or "general purpose event".
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 0, subcode 0x0 (0)
INITIALIZATION_OK"No Error"
This is actually not a node hardware or software
initialization
or test failure. This code should never occur, and suggests
corruption of the PROM log if it is seen.
Resolution: Contact 3PAR technical support.
Fatal error: Code 1, subcode 0x1 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Unknown CPUID string: `xxxx'
Bad or unknown CPU ID (non-Intel).The BIOS is unable to
fully identify the processor. This sub-code indicates the
CPUID string is not "GenuineIntel".
Resolution: A) Replace the processor.
B) Try moving the processor to the other CPU socket.
It could be a single socket problem.
C) Try moving the processor to another system.
It could be node hardware or software.
D) Replace the node motherboard.
Diagnostic: A) Use Whack "cpu id" command. The interesting
line will follow a line similar to:
Intel Pentium III Processor:
or
Intel Pentium 4 Xeon Processor:
Table Continued
178
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 1, subcode 0x2 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Required features 0x008053fb are missing.
Each class of CPU has a list of technology features it
supports.
If this error occurs, it is because the CPU is either
severely
downrev, the CPU is bad, or the motherboard is bad.
Resolution: A) Replace the processor.
B) Try moving the processor to the other CPU socket.
It could be a single socket problem.
C) Try moving the processor to another system.
It could be node hardware or software.
Diagnostic: A) Use Whack "cpu id" command. The interesting
line will be similar to:
Family 6 ... Features 0x0387fbff, Pflags 4
Fatal error: Code 1, subcode 0x3 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: 3PAR has not certified this CPU.
Each run of CPU has a major revision and a minor stepping
number.
If you receive this message, the processor has not yet been
verified by 3PAR for reliable operation. If this is a new
processor, it may be acceptable to press ^C to resume after
this
error. If you are testing a new stepping of the processor
and
need to use it, use the following Whack command to ignore an
unknown CPUID:
Whack> set perm cpu_unqual_ok
Resolution: A) Upgrade to the latest CBIOS to ensure newer
certified processors are acceptable.
B) Replace the processor with one certified
by 3PAR for use with the board.
Diagnostic: A) Use Whack "cpu id" command. The interesting
line will be similar to:
Family 6, Model 8, Stepping 3, Features ...
Fatal error: Code 1, subcode 0x4 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: 3PAR has not certified the bootstrap CPU as a
dual processor.
If more than one processor is installed, both CPUs must be
certified to operate in multiprocessor mode. This error
indicates that the bootstrap processor was found to not be
certified to run in a multiprocessor mode.
See Code 1, sub-code 0x3 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
179
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 1, subcode 0x5 (0)
Description
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: 3PAR has not certified this CPU as a multiple
processor.
See Code 1, sub-code 0x3 for resolution information.
Fatal error: Code 1, subcode 0x6 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Microcode table size is xxxx,
which is not 4 mod 2048.
This is an internal CBIOS consistency check error. If you
see this error, most likely processor execution out of
flash is not stable. The CPU identification is performed
after the flash is fully CRC verified, so this error is
likely the result of a failing CPU or transient bus
operation.
Resolution: A) Replace the processor.
B) Re-flash the CBIOS (no need to upgrade).
Diagnostic: A) Use Arium and scope to watch processor
fetches from flash trigger no unusual bus
operations.
Fatal error: Code 1, subcode 0x7 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Invalid microcode checksum: xxxx
This is another internal CBIOS consistency check error.
Before each block of update microcode is uploaded to the
Pentium, a checksum on it is first verified. If this
checksum is not valid, the block will be rejected with
this error.
See Code 1, sub-code 0x3 for resolution information.
Fatal error: Code 1, subcode 0x8 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: Microcode update failed: expected xxxx, got yyyy
The processor has rejected the microcode update. This
could be any number of things, but is likely due to a
failing processor. At this point a strong 64-bit CRC
has been run successfully across the BIOS and a checksum
for each update line has also passed.
See Code 1, sub-code 0x4 for resolution information.
Table Continued
180
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 1, subcode 0x9 (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: No microcode update found for this CPU.
The BIOS was not able to locate a microcode update for
this particular processor, yet it is listed as a CPU which
requires a microcode update. This is likely due to use of
an unqualified processor.
See Code 1, sub-code 0x4 for resolution information.
Fatal error: Code 1, subcode 0xa (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: CPU failed BIST (built in self test): xxxxxxxx
The processor has failed its own built in self test.
This indicates strongly that the processor is at fault.
Resolution: A) Replace the processor.
B) Replace both processor VRM modules.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
181
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 1, subcode 0xb (0)
Description
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
*** Error: First CPU's bus ratios were wwww:xxxx
but this CPU's bus ratios are yyyy:zzzz.
The two processors in the system board do not have the
same bus clock multiplier. The likely cause is that
the processors are of different clock speeds (or less
likely minor steppings). The "First CPU" as written
above is the bootstrap CPU. On a PIII board, the bootstrap
CPU (CPU3) is to the right, nearest the PromJet interface.
Resolution: A) Remove both heatsinks and verify the
processors are rated for the same clock
speed and bus multiplier.
B) Replace each processor individually.
C) Replace the node motherboard.
Diagnostic: A) Use Whack "cpu id" command.
If you enter Whack before Linux is booted,
you will consistently run on the bootstrap
CPU. If you enter Whack from Linux (using
the whack command), it is a race as to on
which CPU you will enter Whack. The SMI
output indicates on which CPU whack is
running. Using this method, or using the
"cpu switch" command, you can "cpu id" all
processors in the node. Example:
Whack> cpu id
Intel(r) Pentium(r) III Processor:
Family 6, Model 8, Stepping 3, ...
...
CPUID[3] == 0x00000000 0x00000000 0xda28203c ...
...
Bus to CPU ratio == 2/13
...
Clock Frequency Ratio == 7
Fatal error: Code 1, subcode 0xc (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
This CPU does not support clock multiplier changes
In the supported configuration, the two CPUs present in the
node must run at the same clock speed. If the BIOS detects
CPUs which have different clock multipliers, it will
automatically configure all CPUs to use the highest common
clock multiplier. If a CPU's multiplier cannot be changed,
then this fatal error will result.
See Code 1, sub-code 0x4 for resolution information.
Table Continued
182
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 1, subcode 0xd (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
Desired clock multiplier xx is too high for this CPU
This error indicates the CPU does not support a clock
multiplier the BIOS is attempting to set.
See Code 1, sub-code 0xc for resolution information.
Fatal error: Code 1, subcode 0xe (0)
BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID"
Desired clock multiplier xx is illegal for this CPU
See Code 1, sub-code 0xd for information on this error.
Non-fatal error: Code 2,
sub-code 0x1 (0)
RTC_FAILURE "RTC Failure"
*** Error: Real-Time Clock not initialized.
The Real-Time Clock (RTC) is a function of the SuperIO
which provides a battery backed system clock and a small
quantity of battery backed Non-Volatile RAM for system
configuration flags. This error indicates the RTC memory
has become corrupt, possibly due to a dead battery or
battery removal when no mainline power was available.
Resolution: A) Power down, wait 30 seconds, power up.
This error should self-correct (likely
with a loss of current date/time and
other NVRAM contents). Set the date and
time using the Whack "rtc date" command.
B) Replace the RTC battery, located near the
SuperIO ASIC.
C) Use the Whack command "rtc date" to set
the RTC date and time.
D) Replace the node motherboard.
Diagnostic: A) Use Whack "time loop" command. The left
column is RTC seconds and should increment
exactly at second intervals. The right
column is a time scaled processor
performance counter and should (even in
the case of a deviant slow or fast RTC)
still increment nearly in lock step with
the RTC.
B) Verify there is not a dead short across
the RTC battery. This will drain the
battery and immediately invalidate the
Table Continued
Error codes—HPE 3PAR OS 3.2.2
183
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 2,
sub-code 0x2 (0)
Description
RTC_FAILURE "RTC Failure"
RTC_BATTERY_LOW
RTC / NVRAM Battery Failure - Replace battery.
The RTC / NVRAM battery was found to have a low voltage by
the built-in monitoring circuit of the Real Time Clock
(RTC).
The RTC battery provides power to the RTC clock function
of the SuperIO while the board is not drawing mainline
supply power. Over time, this battery's available power
will decay (rated for over five years normal operation).
Resolution: A) Replace the RTC lithium cell battery on the
node motherboard.
B) Replace the node motherboard.
Diagnostic: A) Verify the lithium cell has a 3V charge.
B) Verify there is not a dead short across
the RTC battery. This will rapidly drain the
battery and immediately invalidate the
RTC contents on power down.
Non-fatal error: Code 2,
sub-code 0x3 (0)
RTC_FAILURE "RTC Failure"
RTC_INVALID_TIME
The current RTC date/time is invalid. Enter the correct
date/time or press Tab to acquire it from the network.
If the time has not yet been set, or becomes invalid due to
loss of battery power, this BIOS will report this error and
wait for the user to update the time.
Resolution: A) Enter the correct time.
B) Press TAB to acquire the time from the network.
C) Press ^C to abort prompt and resume boot.
Non-fatal error: Code 2,
sub-code 0x4 (0)
RTC_FAILURE "RTC Failure"
RTC_BATTERY_LOW
RTC / NVRAM Battery Failure - Replace battery.
The RTC / NVRAM battery was found to have a low voltage by
the
built-in monitoring circuit of the RTC (TOD clock).
Resolution: A) Replace the lithium-ion cell battery on the
node.
B) Replace the node motherboard.
Table Continued
184
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 2,
sub-code 0x5 (mode)
Description
RTC_FAILURE "RTC Failure"
RTC was found in Binary mode or 12 Hour mode.
The RTC has two modes of operation. The BIOS prefers the
RTC
to be in BCD mode rather than Binary mode.
If the RTC is in Binary mode, then this must have been set
by the OS.
The BIOS will reset the RTC to BCD mode.
Also, if the RTC
is
in 12 Hour mode, then the BIOS will report this and
correct
the RTC to 24 Hour mode.
The mode byte tells us which
mode
it was in: Bit 1 should be on for 24 Hour mode.
Bit 2 should be off for BCD mode.
Resolution: A) Informational only. If in Development,
then need to alert the development team.
Non-fatal error: Code 2,
sub-code 0x6 (0)
RTC_FAILURE "RTC Failure"
RTC update was stopped.
Resolution: A) Informational only.
then
need to alert the development team.
Fatal error: Code 3, subcode 0x0 (0)
If in Development,
SRAM_INIT_FAILURE"CPU SRAM Init Failure"
During initialization, memory areas are tested before
they are used.SRAM is used by the processor for
persistent storage during early initialization and the
CPU memory tests.
This sub-code indicates that the SRAM walking bits test
has failed and that the onboard SRAM may not be reliable.
Resolution: A) Power down, wait 30 seconds, power up.
This problem is likely not a one time
occurrence, so this problem is likely
to recur.
B) Replace the node motherboard.
Diagnostic: A) Use Arium to set and verify SRAM contents.
If you notice a pattern, it could be a
pulled, stuck, or bridged SRAM line.
Fatal error: Code 3, subcode 0x1 (0)
SRAM_INIT_FAILURE"CPU SRAM Init Failure"
After SRAM contents have been updated with the BIOS static
data, a test is performed to ensure the data arrived intact.
If it did not, this error is generated. The error could
indicate an SRAM failure with the same conditions as above.
See Code 3, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
185
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 4, subcode 0x1 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
Pairvvvv DIMMwwww: (Jxxxx) Bad checksum. Got yyyy, SPD
said zzzz
*** Error: Bad SDRAM configuration.
The SDRAM DIMMs located on the motherboard are used for main
CPU memory and are critical to the proper operation of a
node. Even before the memory is thoroughly tested for
proper
operation, it must be configured to appear in CPUaddressable
space.Each DIMM has a small embedded serial EEPROM which
holds DIMM configuration information such as the number of
rows,
columns, and banks, as well as memory timing. If this
serial
EEPROM becomes corrupt, data stored in it regarding the DIMM
configuration cannot be trusted. So, this EEPROM also
contains
a checksum which the BIOS verifies is correct before
configuring
the DIMM. If this checksum does not match the checksum the
BIOS
computes across the DIMM, this error will result.
The minor code reported is the total count of errors for
the DIMM.
Resolution: A) Replace the defective CPU DIMM with an
identical one.
B) If an identical one is not available, replace
the CPU DIMM pair.
See Code 15 for more resolution information.
Diagnostic: A) The CPU DIMMs appear on the I2C bus at 3.a0
through 3.a6. Use the Whack "d i2c" command to
display the DIMM serial EEPROM contents to
determine if there is a pattern.
Example (DIMM 2):
Whack> d i2c 3.a4.0
See Code 15 for more resolution information.
Table Continued
186
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 4, subcode 0x2 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
Pairww DIMMxx (yyyy): 'zzzz' read failed
*** Error: Bad SDRAM configuration.
Where zzzz is one of:
row address, column address, module rows, cas latency3,
refresh, banks, cas latency2, cas latency1, ras precharge,
act_to_rw, act_to_deact, ras cycle, write_to_deact,
density, frequency, or DIMM type.
This error indicates that a CPU memory DIMM was detected but
that the EEPROM present on the DIMM could not be reliably
read.
The read operation is done through I2C.
See Code 4 above for resolution information.
Fatal error: Code 4, subcode 0x4 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: 'ssss' in Pairtt DIMMuu (vvvv): ww != DIMMxx
(yyyy): zz
*** Error: Bad SDRAM configuration.
This error indicates the BIOS detected the CPU SDRAM DIMMs
in
the bank pair are of a different type.
Resolution: A) Ensure both DIMMs in the pair are identical.
Note that two DIMMs may have the same capacity
but have different number of rows, columns, or
banks. The DIMM configuration must exactly
match. If the DIMMs have the same manufacturer,
markings and capacity, they are probably identical.
See Code 15 for more resolution information.
Diagnostic: A) The EEPROM SPD information in each pair of
DIMMs
should be nearly identical.
See Code 4 above for more diagnostic information.
Fatal error: Code 4, subcode 0x8 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: Pairww DIMMxx (yyyy): bad refresh type zz
*** Error: Bad SDRAM configuration.
This error indicates the value the DIMM reports for refresh
is not valid (greater than the maximum refresh counter).
See Code 4 above for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
187
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 4, subcode 0x10 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: DIMM Pair wwww: **** Type not known ****
(rows xxxx, cols yyyy, banks zzzz)
*** Error: Bad SDRAM configuration.
This error indicates the values the DIMM reports
for rows, columns, and banks do not correspond to any
known configuration for a valid DIMM. It is possible
the DIMM EEPROM data has become corrupt or that the DIMM
is a higher capacity than what is currently supported.
See Code 4 above for resolution information.
Fatal error: Code 4, subcode 0x20 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: Unable to configure any DQS lines.
OR
*** Error: Unable to configure DQS lines for nibble x.
*** Error: Bad SDRAM configuration.
This is P4 only.
This error indicates that BIOS failed to
find
a set of acceptable DQS values for every or one nibble of
the DIMMs.
See Code 4 above for resolution information.
Fatal error: Code 4, subcode 0x100 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: ACT to DEACT of yy.yy clocks is > 6.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
Resolution: A) Replace CPU DIMMs with 3PAR-certified
products.
B) Replace the node motherboard.
C) If there is no other choice, override this error
with a BIOS variable, setting "mem_margin" to
the percentage outside margin. Example:
*** Error: ACT to RW of 3.06 clocks is > 3.00 (2%)
*** Error: Bad SDRAM configuration.
Fatal error: Code 4, subcode 0x0 (2)
Whack> set perm
mem_margin=2
Whack> reboot
Table Continued
188
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 4, subcode 0x200 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: Act to RW of y.yy clocks is > 3.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 4, subcode 0x400 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: RAS precharge time of y.yy clocks is > 3.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 4, subcode 0x800 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: RAS cycle time of y.yy clocks is > 9.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 4, subcode 0x1000 (0)
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: RAS to RAS of y.yy clocks is > 2.00 (zz%)
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
189
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 4, subcode 0x2000 (0)
Description
SDRAM_CONFIG_ERR "Control Cache Config Failure"
*** Error: yyyy: Write to deact > 3. We got zzzz
*** Error: Bad SDRAM configuration.
This error indicates the DIMM pair requires a memory
controller setting which is outside tolerance for the
chipset's memory controller. This DIMM pair would likely
not function correctly if it were allowed to be used.
See Code 4, sub-code 0x100 for resolution information.
Fatal error: Code 5, subcode 0x1 (0)
C_MAIN1_CALL_FAILURE "c_main1 Call Failure"
This exception should never happen unless an earlier
exception was ignored by pressing ^C. This is because
this exception will only occur if the main initialization,
diagnostic test, and boot sequence fails to complete a
boot and then the user chooses to ignore the error.
A further explanation is necessary. There are two halves
to system initialization. The first half relies on only
SRAM being available and so stack and runtime variables
are stored there. Once main CPU memory has been tested,
initialization switches to the second half which relies on
the tested SDRAM for all data structures. This second
half completes initialization and testing of all other node
board devices and executes the boot process. For this
last step to fail, the IDE disk must either not be present
or contains an invalid boot. At that point a fatal error
is generated.
Do not ignore this condition. It is a final recourse and
an abort will reboot or hang the node board. It is safer
at this stage to press ^W and enter Whack.From Whack,
you can reboot with the "reboot" command.
Resolution: A) Check control cache (CPU) DIMMs are installed
and pass initialization.
B) Verify the node boot drive is present and node
software has been installed.
C) Replace the node, including CPU DIMMs and boot drive.
Table Continued
190
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 6, subcode 0x1 (0)
Description
SRAM_BAD "CPU SRAM Bad"
*** SRAM failure: address xxxxxxxx wrote yy but read zz
This failure indicates an early SRAM verification test
revealed a problem with the SRAM. This is an unrecoverable
error which likely requires hardware diagnostic. This
error is displayed by low level init code.It will never
be written to the PROM log because hardware which writes to
the PROM relies on correctly functioning SRAM.
Resolution: A) Cycle power on the node.
B) Replace the bootstrap CPU.
C) Replace the node motherboard.
Diagnostic: A) Use Arium to set and verify SRAM contents.
If you notice a pattern, it could be a
pulled, stuck, or bridged SRAM line.
Fatal error: Code 7, subcode xxxx (yyyy)
SDRAM_BUS_FAST "Control Cache Bus Fast"
*** Error: Front side bus speed xxxx > expected yyyy
This error indicates the BIOS has detected that the front
side
bus speed exceeds the expected speed (133 MHz on PIII, 533
MHz
on P4, 1333 MHz on 5000P).The system may not perform
reliably.
Resolution: A) Cycle power on the node.
B) Replace the bootstrap CPU.
C) Replace the node motherboard.
Diagnostic: A) Check the oscillator for the front side bus
with a frequency counter or an oscilloscope.
Fatal error: Code 8, subcode xxxxxxxx (yyyyyyyy)
MACHINE_CHECK_FAILURE"Machine Check Failure"
Machine check:
MCG_STATUS == xxxxxxxx yyyyyyyy
During BIOS initialization and testing, the processor must
execute instructions. If this error results at any point,
it is likely due to failing hardware related to the CPU's
instruction execution path.
Resolution: A) Cycle power on the node.
B) Update the node firmware to the latest version.
C) Replace CPU SDRAM in pairs.
D) Replace the node motherboard.
Diagnostic: A) Replace CPU VRMs.
B) Replace CPUs.
C) Use Arium and set a breakpoint on a machine
check to determine what errant instructions
led up to the machine check.
D) This problem may also be a BIOS or booter
software bug. Observe the values of the error
sub-code and data. They make up the 64-bit
value of the MCG_STATUS status register.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
191
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 9, subcode 0x0 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
*** Entering memory segment test: Stack is in xxx ***
One of the first memory tests performed in diagnostic
mode is a sequential address or random data test.
If there is no memory in the system, or the memory DIMMs
are mismatched, or there is a memory subsystem problem,
this error may result.
Resolution: A) Verify memory is installed and in matched
pairs (same manufacturer, exact same memory
configuration and speed).
B) Replace CPU DIMMs with a set of known good ones.
C) Replace the node motherboard.
Diagnostic: A) Change memory with Whack "c <addr>" command.
Examine memory with Whack "d <addr>" command.
B) Use Arium to modify and examine memory.
Fatal error: Code 9, subcode 0x1 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Insufficient memory: BSS end == xxxx, stack limit == yyyy
During the first part of initialization, system stack comes
from SRAM.The second part of initialization, system stack
comes from CPU memory.If there is insufficient SDRAM
(such as no DIMMs installed) this error may result.
It is a bad idea to ignore this error with ^C as the system
stack will fall past the available memory and probably hang
hard the initialization.
See Code 9, sub-code 0x0 for resolution information.
Table Continued
192
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 9, subcode 0x2 (0)
(<DIMM>) "Control Cache Failure"
Expected sdram_init_test to be xxxx, but it was yyyy.
After SDRAM has been initialized and scrubbed, the BIOS
copies runtime variables from Flash to CPU memory.
The fact this data is copied to SDRAM is later verified.
This fatal error may be caused by either a software
error in the BIOS, a hardware error (such as flaky CPU
memory), or user intervention such as modifying the memory
containing the SDRAM copy of the runtime variables.
Resolution: A) Reboot.If the problem is caused by flaky
hardware, a prior memory test should catch
this condition.
B) Upgrade BIOS version. Not a likely solution
since this code path is well tested every
time the system is booted.
C) Replace CPU DIMMs with a set of known good ones.
D) Replace the node motherboard.
Diagnostic: A) Examine the BIOS memory area using the Whack
"d <addr>" memory dump command.SDRAM data
appears in CPU memory in the 0x000d0000
region.The key value is 0xdeadbeef.
Example:
Whack> mem search d0000 10000000 deadbeef
Searching 00000000 .. 01000000 for deadbeef
[ ]
Found at 000d0cb0
If this key cannot be found, something went wrong
with the copy or memory has become corrupt.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
193
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 9, subcode 0x3 (0)
Description
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Low 1M test: Test completed: x iterations, y probes, z
errors found
The low 1 MB of memory is thoroughly tested to ensure
reliable
operation as this is the memory area that the BIOS and Whack
use during further initialization and testing.If this test
fails, it should not be ignored with ^C as having reliable
system memory is critical to proper operation.
Resolution: A) Cycle power on the node. Occasionally, memory
will fail during a memory test due to metallic
dust.
B) Reseat CPU memory DIMMs.
C) Pull CPU DIMMs, blow dust from sockets, reseat.
D) Replace CPU memory DIMMs in pairs to ensure
replacement parts are matched.
PIII nodes: Non-paired DIMMs are proximally
closest. Paired DIMMs are the leftmostleftmost and rightmost-rightmost of each
two which are proximally closest.
P4 nodes: Paired DIMMs are proximally closest.
DIMM0 and DIMM1 are a pair.
DIMM2 and DIMM3 are a pair.
E200, Ironman, and Tinman nodes:
There is only a single pair of CPU
memory DIMMs.
E) Replace the node motherboard.
Diagnostic: A) Run the memory test manually from Whack. You
can use the "mem test range <base> <size>" command to
test a range of memory.
B) Write to known bad memory with the Whack
"c <addr>" command and observe written contents
with "d <addr>" Write enough patterns that you
might be able to observe a pattern such as stuck
or floating bit.
Fatal error: Code 9, subcode 0x4 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
High 64K test: Test completed: x iterations, y probes, z
errors found
In addition to the low 1 MB of memory, older BIOS versions
also thoroughly tested the high 64 KB of memory. This is
because the operational stack for the CBIOS and Whack used
to
reside at this address, which made the memory critical for
proper initialization and testing.The current BIOS now uses
memory below 1 MB for stack space, so this failure code is
deprecated.
See Code 9, sub-code 0x3 for resolution information.
Table Continued
194
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 9, subcode 0x5 (0)
Description
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
SDRAM walk: Test completed: xx iterations, yy probes, zz
errors found
During initialization (prior to a thorough test of the low
1 MB of memory), a quick walk through all CPU memory is
performed.If an error is found, this fatal error is
displayed.
See Code 9, sub-code 0x3 for resolution information.
Fatal error: Code 9, subcode 0x6 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Full SDRAM test: Test completed: xx iterations, yy probes,
zz errors found
During later testing, a full SDRAM test is performed which
more completely verifies proper memory operation than the
cursory SDRAM walk. This test is very similar to the initial
thorough 1 MB test done during initialization.
See Code 9, sub-code 0x3 for resolution information.
Fatal error: Code 9, subcode 0x7 (0)
SDRAM_FAILURE (<DIMM>) "Control Cache Failure"
Pairwwww DIMMxxxx: Illegal SPD <name of value> <value>
This error indicates that a CPU DIMM was detected but that
the EEPROM present on the DIMM reported an illegal or
unsupported value for our memory controller.
Example:
Density (SPD byte 31) has more than 1 bit set (ie. 0x30)
which indicates a non-standard part.
See Code 9, sub-code 0x3 for resolution information. Most
likely, the DIMM is not qualified for use in our Node Board.
The DIMM number is logged in the Data field of the Fatal
Error.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
195
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 9, subcode 0x10 (0)
Description
SDRAM_FAILURE"Control Cache Failure"
Cannot allocate xx bytes for PCI bus yy scan
or
Cannot allocate xx bytes for PCI device on bus yy
This error indicates there was not enough memory or a memory
error occurred while attempting to allocate heap space
during
the PCI device probe. SDRAM is needed because the BIOS
maintains a list of PCI devices present in the system.
Resolution: A) Cycle power on the node.
B) Remove all PCI cards.
C) Replace CPU DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Set BIOS verbose init flags to get more info
during memory init and PCI scan.
Whack> set perm mem_verbose
Whack> set perm pci_all
B) Use the "config->heap" command to show the
heap_base, heap_top, and heap_limit values.
Fatal error: Code 9, subcode 0x11 (0)
SDRAM_FAILURE"Control Cache Failure"
Cannot find bus xx in scanned PCI busses
During the PCI bus scan, a list of PCI devices present is
recorded in SDRAM.For each device present, a block of
memory is allocated and initialized. This error indicates
that a data value indicating bus number could not be found
in the list of devices previously scanned.This is probably
due to an SDRAM or CPU failure.
Resolution: A) Cycle power on the node.
B) Remove all PCI cards.
C) Replace CPU DIMMs.
D) Replace bootstrap CPU.
E) Replace the node motherboard.
Fatal error: Code 9, subcode 0x12 (0)
SDRAM_FAILURE"Control Cache Failure"
No memory installed.
This error indicates that the CPU memory scan failed to
locate any usable memory for the system. There must be
at least one bank of SDRAM configured for the node to
operate correctly.
Resolution: A) Cycle power on the node.
B) Verify CPU DIMM scan output shows DIMMs.
C) Replace CPU DIMMs.
D) Replace the node motherboard.
Table Continued
196
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 9, subcode 0x13 (xxxx)
SDRAM_FAILURE "Control Cache Failure"
Unknown DDR2 frequency (xxxx)
This error indicates that the CPU memory installed is
of an unrecognized and thus unsupported memory speed.
Supported speeds include 533, 667 and 800 MHz.
Resolution: Replace CPU DIMMs with 533, 667 or 800 MHz
modules.
Fatal error: Code 9, subcode 0x14 (0)
SDRAM_FAILURE "Control Cache Failure"
FB-DIMM Initialization Failure
This error indicates that CBIOS was unable to initialize
the CPU memory installed.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs.
C) Replace the node motherboard.
Fatal error: Code 9, subcode 0x15 (data)
SDRAM_FAILURE
"Control Cache Failure"
This error indicates that an uncorrectable ECC error was
detected on a DIMM. The data value is a bitmask that may be
decoded
to determine which DIMM had the error. A value of 1
indicates
DIMM 0, 2 indicates DIMM 1, 4 -> DIMM 2, etc. More than one
bit may be set if CBIOS is unable to isolate the error down
to a single DIMM.
Resolution: A) Cycle power on the node.
B) Replace FB-DIMM(s).
C) Replace the node motherboard.
Fatal error: Code 10, subcode 0x1 (0)
PCI_FAILURE "PCI Failure"
*** Error: Bus xx cannot be parent of bus yy.
*** Error: Failure occurred during PCI device allocation.
During the PCI scan, many devices which were programmed by
previous PCI scan steps are examined again to verify the
programming was successful. This error indicates that a
bridge failed to record the PCI bus number of bridges
below it.
Resolution: A) Cycle power on the node.
B) Remove all PCI cards.
C) Replace the node motherboard.
Diagnostic: A) Use Whack to evaluate offset 0x45 on the
failing
parent bridge to determine if the value isn't
sticking there or there is some problem with
the PCI bus below it.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
197
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x2 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: Vendor vvvv, Device wwww, for index xxxx:
Expected size yyyy, but got zzzz
There are on the PCI bus several devices in a node board
which are known by the CBIOS to have specific sizes. As a
hardware consistency check, the BIOS verifies that these
devices are not only present, but also have appropriate
memory and I/O space requirements.If any device is found
outside of expected requirements, it will cause this error.
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Swap out the PCI card for another qualified
card (if it's a card).
D) Pull all PCI cards to see if the problem
persists. If so, replace any defective cards.
E) Replace the node motherboard.
Diagnostic: A) Use Whack command "pci probe" using the
vendor ID provided in the fatal error to
acquire the address information the card
provides. If this information does not match
the error above, this may be a transient.
B) Use the Whack "d pci" command, providing it
the "<bus>.<dev>.<func>" of the PCI device.
Look for patterns in the data that might
indicate a stuck bit.
Table Continued
198
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 10, subcode 0x3 (0)
PCI_FAILURE "PCI Failure"
*** Error: I/O space: address limit xxxx exceeded: yyyy
This error indicates that the system has run out of
available
mapping area while attempting to map this device into the
CPU's
I/O address range (0x0000 - 0xfe00). The likely cause of
this
error is that a prior PCI device is consuming too much I/O
space.Since most device I/O ranges are extremely small, it
is likely a defective PCI card or PCI bus problem which is
the cause.
Resolution: A) Reseat all PCI cards.
B) Swap out individual PCI cards.
C) Replace the node motherboard.
Diagnostic: A) Use Whack command "pci init" or "pci scan" to
re-scan the bus. It may provide the information
you need to determine the bad device.
B) Review the prior PCI allocations to determine
one which is unusually large. You will need
to enter diagnostic mode to do this. There
are two ways:
1) Press ESC at the initial memory test. Type
"go" at the Whack prompt. Answer 'y' to run
the PCI initialization. Answer 'a' to print
on all phases.
2) Press ^W at the initial memory test.Type
"config diag" at the Whack prompt. Answer 'y'
to run the PCI initialization. Answer 'a' to
print on all phases.
Fatal error: Code 10, subcode 0x4 (0)
PCI_FAILURE "PCI Failure"
*** Error: 32-bit prefetchable memory: address limit xx
exceeded: yy
Many PCI devices (and software drivers) require DMA
addressable
memory within the 32 bit address space (less than 4 GB).
For
this reason, all 32 bit PCI devices are required to be
mapped
within this space.Currently, all CPU memory is also forced
to
be mapped within this space, limiting the maximum 32-bit CPU
memory to about 3 GB.
Resolution: A) Swap out individual PCI cards.
B) Replace the node motherboard.
See Code 10, sub-code 0x3 for diagnostic information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
199
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x5 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: 32-bit non-prefetchable memory: address limit xx
exceeded: yy
The non-prefetchable memory has the same 32 bit limitations
as
prefetchable memory does.
See Code 10, sub-code 0x4 for resolution information.
Fatal error: Code 10, subcode 0x6 (0)
PCI_FAILURE "PCI Failure"
*** Error: 64-bit prefetchable memory: address limit xxxx
exceeded: yyyy
64 bit PCI devices are not limited to a 32 bit address
space.
The CPU, however, can only access a 36 bit space (when
virtual
memory is enabled). Because most drivers need direct access
to
the memory a device provides on the bus, the device must be
addressable by the Pentium and so the maximum 64 bit address
allowed is 0xf:ffffffff. This is 64 GB.
See Code 10, sub-code 0x4 for resolution information.
Table Continued
200
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 10, subcode 0x7 (0)
PCI_FAILURE "PCI Failure"
Testing CM PCI 64-bit data lines: FAIL
The Cluster Manager (Eagle / Osprey) is used to perform a
walking
bit test on both PCI0 and PCI1 data paths to CPU memory.
If a
problem is found, with either path, this error will be
displayed.
The error will be further qualified by one of the following
prior
lines:
PCIxxxx
PCIxxxx
BitZZ
PCIxxxx
PCIxxxx
BitZZ
PCIxxxx
BitZZ
all data bits stuck high
found data bits stuck high: BitWW, BitXX, BitYY,
all data bits stuck low
found data bits stuck low: BitWW, BitXX, BitYY,
data bits possibly floating: BitWW, BitXX, BitYY,
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Pull all PCI cards to see if the problem
persists. If so, replace any defective cards.
D) Replace the node motherboard.
Diagnostic: A) Depending on the specific error above, check
for
stuck or floating pins on CM's connection to
the appropriate PCI bus.
B) Depending on the specific error above, check for
stuck or floating pins on CIOB's (RCC South Bridge)
connection to the appropriate PCI bus.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
201
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x8 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: Miscompare CPU Memory to CM
Expected (0xAAAAAAAA)
Actual
(0xBBBBBBBB)
Offset
(0xCCCCCCCC)
*** Error: Miscompare CM to CPU Memory
Expected (0xAAAAAAAA)
Actual
(0xBBBBBBBB)
Offset
(0xCCCCCCCC)
CBIOS runs simple CM PCI Tests as part of POST in both
normal operation and manufacturing test. The tests use XCBs
to transfer data over both CM PCI interfaces from Cluster
Memory
to CPU Memory and back. If any test fails due to a data
miscompare, the test will generate this fatal error code
with
sub-code '0x4'.
These tests are similar to the Cluster Memory Tests and may
fail
due to Cluster Memory SDRAM hardware or CPU SDRAM hardware
failures. Any test failure will result in a fatal error.
Resolution: A) Cycle power on the node.
B) Reseat CM memory riser card.
C) Reseat the failing Cluster memory DIMM.
D) Replace the failing Cluster memory DIMM.
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the CMA
register set which is mapped into CPU memory for
access.Use the Whack "pci probe mem 1590"
command to find the Cluster Manager on the PCI bus.
The base address in CPU memory for the configuration
and status registers (CSRs) is Window 0. Example:
Whack> pci probe mem 1590
Win Baseaddr Basesize Identity
[0] 00:90200000 00:000004003PAR (ASIC) LPC#
[1] 00:20000000 00:20000000
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x90200000 above).
This is the base address of the Cluster Memory
Control Register Block.Refer to the Scaffold
System Architecture Reference for information on
register programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Table Continued
202
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 10, subcode 0x9 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI bridge has dead clock: xxxxxxxx
This error indicates one of the PCI bridges on the board has
a bad clock value and is refusing to accept programming of a
good clock.
Resolution: A) Cycle power on the node. The problem may
occur on power cycle (only) with random
chance on a bad board.
B) Pull all PCI cards which have integrated
bridges (QLogic quad port cards are a good
example of this). You should power cycle
several times to determine it is not an
intermittent problem with the motherboard.
C) Replace the node motherboard.
Diagnostic: A) The PCI output just prior to the fatal error
will indicate which of the four bridges has
failed.It will be text similar to
"Bridge #1 (controls slots 4 & 5)."Refer
to rework documentation to correct this
problem.
Fatal error: Code 10, subcode 0xa (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI bridge has bad GPIO clock select inputs: x
This error indicates one of the PCI bridges on the board has
a bad GPIO input which selects bridge clock sources on a
power
on condition.
Resolution: A) Cycle power on the node. The problem may
occur on power cycle (only) with random
chance on a bad board.
B) Replace the node motherboard.
Diagnostic: A) The PCI output just prior to the fatal error
will indicate which of the four bridges has
failed.It will be text similar to
"Bridge #1 (controls slots 4 & 5)."Verify
that GPIO lines 0-3 are being properly pulled
high by comparing against known good board.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
203
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0xb (0)
Description
PCI_FAILURE "PCI Failure"
Warning: This node has xx PCI cards present, but yy is the
required minimum.Please verify your node is properly
configured. You may adjust the required minimum with
the "set pci_min" command.
This error indicates this node has detected less PCI cards
than the recommended 3PAR minimum.In a system configuration
where there are less than the minimum active PCI cards,
inactive load cards should be used to reach the required
minimum.
Resolution: A) Verify the minimum required number of PCI
cards
are inserted in the node. Install dummy load
cards to reach the required minimum.
B) Verify all PCI cards in the system have been
identified.Replace any missing card.
C) Replace the node motherboard.
Diagnostic: A) Isolate the problem to one or more slots by
placing load cards in all slots, and then
using the "i2c vsc" command to find which
slots do not report a load.
B) You can use the "i2c vsc" command to verify
cards are reporting correct wattages. You can
use the "pci probe" command to display all PCI
devices, and locate which slot in which they
are inserted. Replace any defective card.
Fatal error: Code 10, subcode 0xc (0)
PCI_FAILURE "PCI Failure"
Testing CM PCI 64-bit address lines: FAIL
CM XCB TEST miscompare at offset, uuuu
Expected (vvvvvvvv)
Actual
(wwwwwwww)
CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz)
The Cluster Manager is used to perform a walking bit test on
both PCI0 and PCI1 address lines paths from CPU memory into
cluster memory. If a problem is found (with either path),
this error will be displayed. The particular memory address
which caused this error will be indicated.
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Pull all PCI cards to see if the problem
persists. If so, replace any defective cards.
D) Replace the node motherboard.
Diagnostic: A) Depending on the specific error above, check
for
stuck or floating pins on the Cluster Manager's
connection to the appropriate PCI bus.
B) Depending on the specific error above, check for
stuck or floating pins on CIOB's (RCC South Bridge)
connection to the appropriate PCI bus.
Table Continued
204
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0xd (zz)
Description
PCI_FAILURE "PCI Failure"
*** Vendor xxxx device yyyy on motherboard not yet
qualified.
*** Vendor xxxx device yyyy in slot zz not yet qualified.
This is an error indicating that the device found is not
recognized by the BIOS as a 3PAR-qualified device.This may
be because the board is a new generation or that there was a
PCI error in communicating with the device. In the former
case, it is probably safe to press ^C to ignore this error.
In the later case, it is possible that part of the board has
become non-functional to where the BIOS may not be able to
determine if the rest of the board will continue to
function.
If you need to override this feature, enter Whack at this
point by pressing ^W. Enter the following command:
Whack> set perm pci_unqual_ok
If the data field is non-zero, it indicates the BIOS
discovered the problem is a card in a particular PCI slot.
The specific codes are as follows:
* 30 is PCI Slot 0
* 31 is PCI Slot 1
* 32 is PCI Slot 2
* 33 is PCI Slot 3
* 34 is PCI Slot 4
* 35 is PCI Slot 5
Resolution: A) Swap out the PCI card for a qualified card.
B) Replace the node motherboard.
Diagnostic: A) If the card is a QLogic, use the Whack
command
"pci probe 1077" to find the device and display
its device ID. You may need to press ^W first
if the BIOS is still at the fatal error. There
are several currently qualified PCI cards. Some
include the QLogic 2200, 2300, and 2312. More
will be qualified in the future.
B) The PCI probe should have shown the bus.dev.func
specifier you need to display card information
directly using Whack. Use the Whack "d pci"
command giving it the "<bus.<dev>.<func>" as
a parameter. You should see a standard PCI
header present.
C) Try the same or a different card in a different
PCI slot to see if the slot has failed.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
205
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0xe (0)
Description
PCI_FAILURE "PCI Failure"
PCI bus scan and allocation completed in 21 passes.
*** Error: PCI scan required too many passes. Bad PCI
interaction.
This error indicates the PCI scanning code was unable to
lay out a valid PCI address table mapping within 21 passes.
The cause of this error is possibly due to either defective
hardware or BIOS firmware.
Resolution: A) Remove all PCI cards. If error goes away,
attempt to find failed card by process of
elimination (put back half of the cards and
try to boot again).
B) Replace the node motherboard.
Diagnostic: A) Observe other errors that may happen at the
same time as this error. Is there and indication
that it is a board ASIC which is failing? In
general, some other error should trigger before
this one, since device limits are verified.
B) Contact BIOS engineer for debug assistance.
Fatal error: Code 10, subcode 0x10 (0)
PCI_FAILURE "PCI Failure"
*** Error: IMB.A isn't turned on
This error indicates a possible hardware failure on the
board.
The bus which connects the CMIC (P4 North Bridge) to CIOB A
failed to initialize properly.
Resolution: A) Cycle power on the node. The problem may
occur with random chance on a bad board.
B) Replace the node motherboard.
Diagnostic: A) Verify CIOBX2 is receiving a valid clock.
B) Look at PCI device 0.0.2.f8 for CIOB A, or PCI
device 0.0.1.f8 for CIOB B. The BIOS observes
bit 0 of this register to tell if the IMB
initialized (0 indicates success).
Table Continued
206
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x11 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: Expected to see device xx.yy.zz `uuuu'
but it is not responding: Vendor vvvv, Device wwww.
*** Error: Failure occurred during PCI device allocation.
The BIOS checks for specific onboard PCI devices (such as
bridges) which are known to be on a particular node board.
If a device listed in the BIOS table is not found on the
board, then this error will result.
Resolution: A) Cycle power on the node.
B) Remove PCI cards and see if error disappears.
C) Replace the node motherboard.
Diagnostic: A) The error should indicate for you which
device
is missing. Observe to see if there is another
unknown onboard device which has appeared in
its place.This could be the device, masked
behind a PCI bus problem.
B) Verify the PCI ASIC is functional by checking
clocks and PCI data lines to the device.
Fatal error: Code 10, subcode 0x12 (0)
PCI_FAILURE "PCI Failure"
*** Error: The following device is not listed in the
hardwired PCI descriptor table:
Vendor xxxx, Device yyyy
*** Error: Failure occurred during PCI device allocation.
Onboard PCI devices (such as bridges) are well known by
the BIOS to appear at specific bus addresses. If this
device is not known by the BIOS, but it is configured on
a bus which is not externally exposed (PCI slot), then
you will see this error. Since the node board is a closed
solution, this error might occur if an on board device is
failing and does not report a correct device vendor/ID, or
corrupts the device vendor/ID reported by another device on
the bus.
See Code 10, sub-code 0x11 for resolution information.
Fatal error: Code 10, subcode 0x13 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Was yyyy but is now zzzz
*** Error: Failure occurred during PCI device allocation.
The PCI header is re-read on multiple passes of the PCI
initialization. If a mismatch is found with a previous read
of the PCI bus, then this error will result. This is a
strong indicator of a flaky device or bus.If the BIOS is
in Diagnostic mode (press ESC at the initial memory test),
at this point, the following will also be displayed:
Starting infinite PCI read loop...
In Diagnostic mode, once a failure is detected, this test is
then repeated until manual intervention.
See Code 10, sub-code 0x3 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
207
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x14 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Invalid 64-bit size: yyyy
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, a 64 bit window was found on the
PCI bus which is outside the 36 bit range imposed by the
CPU.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x15 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Allocation size is zero
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, a window was found on the PCI
device with a size of zero. This fatal error may indicate
that the BIOS is not able to properly communicate with the
PCI device.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x16 (slot)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, each memory or I/O window present
on each device found on the bus is programmed with a CPU
memory
bus address so that it may be accessed by further BIOS
initialization, tests and of course the main operating
system.
The BIOS verifies the address it programs for each window
was correctly programmed (by reading back the value just
written). If they do not match, this error is generated.
The slot number is an ASCII value represented as
Hexadecimal.
If the slot value is 0, then the failure occured on a node
motherboard device. If PCI Slot 0 was involved, then slot is
30. PCI Slot 1 is 31; PCI Slot 2 is 32; PCI Slot 6 is 36,
etc.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x17 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz
*** Error: Failure occurred during PCI device allocation.
See Code 10, sub-code 0x16 for information on this error.
Table Continued
208
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x18 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz
*** Error: Failure occurred during PCI device allocation.
See Code 10, sub-code 0x16 for information on this error.
Fatal error: Code 10, subcode 0x19 (0)
PCI_FAILURE "PCI Failure"
*** Error: uu.vv.ww.xx: Invalid allocation size: yyyy
(Must be a power of 2)
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, each memory or I/O window present
on each device found on the bus is programmed with a CPU
memory
bus address. The size of the window require is provided by
the specific PCI device. It is required that this window is
a power of 2 in size (1 KB, 2 KB, 4 KB, ... 32 MB, 64 MB,
etc).
This is a consistency check the BIOS performs to ensure it
is properly communicating with the PCI device.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x1a (0)
PCI_FAILURE "PCI Failure"
*** Error: Device does not fit into address space, skipping:
attempted addr xxxx, size yyyy
*** Error: Failure occurred during PCI device allocation.
During PCI initialization, the entire PCI bus is walked
as a tree and devices registers are initialized and mapped
into processor address space using this tree. The bus
structure is then ordered and summarized into a table so
that software can later find specific devices for high level
initialization. This specific error indicates the PCI scan
attempted to map a PCI device into the CPU's 32-bit address
space, but failed due to no more available space. Verify
that NVRAM flags such as "pci_base" and "mem_max" are not
set
to unusual values.
See Code 10, sub-code 0x3 for resolution information.
Fatal error: Code 10, subcode 0x1b (0)
PCI_FAILURE "PCI Failure"
*** Error: IMB.B isn't turned on
This error indicates a possible hardware failure on the
board.
The bus which connects the CMIC (P4 North Bridge) to CIOB B
failed to initialize properly.
See Code 10, sub-code 0x10 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
209
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x1c (data)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCI CIOB Primary www MHz (xxx), Secondary yyy
MHz (zzz)
This error indicates a possible hardware failure on the
board.
The CIOB (which connects the North Bridge to the I/O system)
has an incorrect clock speed.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Eagle nodes should run CIOB at 66 MHz on
both the
primary and secondary sides. Review fatal error
output to determine if the primary side or
secondary side is affected.
B) Verify clock with scope. Check strapping resisters
which select CIOB bus clock speed on reset.
Fatal error: Code 10, subcode 0x1d (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI bridge has bad secondary speed: v.w.x.y =
zzzz
This error indicates one of the PCI bridges on the board has
a bad speed selection set, which could indicate an incorrect
type of PCI card has been installed or that bridge mode
select
strappings are bad.
Resolution: A) Pull all PCI cards one at a time to determine
failed card.
B) Replace the node motherboard.
Diagnostic: A) Check Intel 31154 mode select strapping
resistors to ensure PCI-X mode is selected.
Refer to Ironman rework instructions to
correct this.
B) PCI offset 0xf2 in the 31154 indicates, among
other things, the mode selected. Bits 6-8
should have the value 010 for proper operation
(100 MHz secondary PCI bus speed).
C) Some pre-production Ironman nodes have not been
reworked to correct this defect. To ignore
this error, set the "pci_speed_any" NVRAM flag
by pressing ^W to enter Whack and entering:
Whack> set perm pci_speed_any
Table Continued
210
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 10,
sub-code 0x1e (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCIe x.y.z: Invalid port configuration
strappings (xxx).
The indicated PLX switch chip has incorrect hardware
configuration
strappings.
Resolution: Replace the node motherboard.
Non-fatal error: Code 10,
PCI_FAILURE "PCI Failure"
sub-code 0x1f (YYYYYYxx) *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected
link width detected (xx).
This error indicates that the device found is not running
at the correct PCIe link width.
If the "xx" portion of the data field is non-zero,
it indicates a problem with a particular PCI slot.
The specific codes for "xx" are as follows:
30 is PCI Slot 0
31 is PCI Slot 1
32 is PCI Slot 2
33 is PCI Slot 3
34 is PCI Slot 4
35 is PCI Slot 5
36 is PCI Slot 6
37 is PCI Slot 7
38 is PCI Slot 8
To ignore this error, enter Whack by pressing ^W and
entering:
Whack> set perm pci_speed_any
Resolution: A) Replace indicated card (if "xx" is non-zero).
B) Replace node motherboard.
Non-fatal error: Code 10,
sub-code 0x20
(YYYYYYxx)
PCI_FAILURE "PCI Failure"
*** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected
link speed detected (xxx).
This error indicates that the device found is not running
at the correct PCIe link speed.
See Code 10, sub-code 0x1f for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
211
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 10,
sub-code 0x21 (xxx)
Description
PCI_FAILURE "PCI Failure"
*** Error: Slot xxx indicates no HBA present, but PCI
device found
This error indicates that a PCI device was found in a slot
which was expected to be empty. The likely cause of this
failure is an HBA which is not fully seated. If this is an
expected failure, you can set "pci_missing_ok" to override
this check.
Resolution: A) Reseat or replace the indicated HBA.
B) Replace node motherboard.
Non-fatal error: Code 10,
sub-code 0x22 (xxx)
PCI_FAILURE "PCI Failure"
*** Error: Slot xxx indicates HBA present, but no PCI
device found
This error indicates that no PCI device was found in a slot
which was expected to be populated (HBA present). The
likely
cause of this failure is an HBA which has failed. If this
is
an expected failure, you can set "pci_missing_ok" to
override
this check.
Resolution: A) Reseat or replace the indicated HBA.
B) Replace node motherboard.
Table Continued
212
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 10,
sub-code 0x23 (data)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCI device bb.dd.ff (slot ss) hung during
previous error scan
This error indicates that during a previous PCI scan, the
CPU
hung. The most probable cause of this error is a defective
HBA.
The data field provides several details about the suspect
device.
The low byte indicates which PCI slot, if known.
Value
0x30
corresponds to PCI Slot 0, 0x31 is PCI Slot 1, ..., 0x38 is
PCI Slot 8. Byte 2 and byte 1 correspond to the PCI
bus.dev.func.
Byte 3 indicates whether the failure occurred during a PCI
error
scan, and whether this is a repeat failure. Decode table
for data:
bits 0..7 PCI Slot (0x00=MB, 0x30..0x38=PCI Slot 0..8)
bits 8..10 PCI func
bits 12..15 PCI dev
bits 16..23 PCI bus
bits 24..28 Reserved (0)
bit 29 Repeat flag (1=repeat -- fatal error)
bit 30 Hang during (0=PCI scan, 1=PCI error scan)
bit 31 Reserved (1)
Example (data=c00a0a35):
The 0x35 value implicates PCI Slot 5.
The 0a0a value is bus.dev.func 0a.01.02.
The c0 value tells the hang occurred during a PCI error
scan.
Example (data=a0090831):
The 0x31 value implicates PCI Slot 1.
The 0908 value is bus.dev.func 09.01.00.
The a0 value indicates a repeated hang during the PCI scan.
Resolution: A) Replace HBA if PCI Slot is indicated.
B) Convert to PCI bus.dev.func and match with the
suspect PCI device from previous BIOS messages.
If this is an onboard device, replace the node
motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
213
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x24 (data)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCI device bb.dd.ff (slot ss) hung during
previous scan
Hang occurred multiple times.
This error indicates that during a previous PCI scan, the
CPU
hung repeatedly. Other than this being a fatal error, this
code is identical to that of sub-code 0x23. Note that if
this fatal error is seen without a preceeding non-fatal subcode
0x23, then the failure is likely to be the node motherboard.
If the non-fatal is not logged, then a PCI scan hung
earlier in
the PCI tree than a previous hang.Unless both hangs happened
on the same HBA, the cause is likely a shared device on the
node motherboard.
See Code 10, sub-code 0x23 for resolution information.
Fatal error: Code 10, subcode 0x25 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCIe bb.dd.ff: Serial EEPROM is not present.
This error indicates that the PCI device does not have
an EEPROM attached.
Resolution: Replace node motherboard.
Fatal error: Code 10, subcode 0x26 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCIe bb.dd.ff: Unable to write Serial EEPROM.
This error indicates that the EEPROM failed to be
programmed.
Resolution: Replace node motherboard.
Fatal error: Code 10, subcode 0x27 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCIe bb.dd.ff: Unable to read Serial EEPROM.
*** Error: PCIe bb.dd.ff: Serial EEPROM index XX value
0xXXXXXXXX !=
expected 0xXXXXXXXX.
This error indicates that BIOS was unable to verify
the EEPROM contents after programming or that the data was
successfully written but did not persist.
Resolution: Replace node motherboard.
Table Continued
214
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 10, subcode 0x30 (0)
Description
PCI_FAILURE "PCI Failure"
*** Error: PCIe b.d.f Link Width incorrect size. Found xx,
s/b yy
This error indicates that the device found is not running
at the correct PCIe link width. xx is actual PCIe link
width and yy is
the expected PCIe link width.
This error may be logged with some HBA cards with x4 PCIe
lanes.
To ignore this error, enter Whack by pressing ^W and
entering:
Whack> set perm pci_speed_any
Resolution: A) Ok to ignore if this is related to HBA card
with x4 PCIe lanes
B) Replace indicated card.
Fatal error: Code 10, subcode 0x31 (0)
PCI_FAILURE "PCI Failure"
*** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected
link width detected (xx).
This error indicates that Harrier2 ASIC device found is not
running
at the correct PCIe link width.
Resolution: A) Power cycle the node
B) Replace node motherboard.
Fatal error: Code 10, subcode 0x32 (0)
PCI_FAILURE
"PCI Failure"
*** Error: PCIe b.d.f indicates HBA present, but no PCI
device found
This error indicates that PCI device not found.
Resolution: A) Reseat card
B) Replace indicated card
Table Continued
Error codes—HPE 3PAR OS 3.2.2
215
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 11, subcode yyyy (0)
UNRECOVERABLE_TRAP "Unrecoverable Trap"
*** Error: CPU exception detected: Stopping execution.
The BIOS installs an interrupt handler to catch spurious
(unexpected)
interrupts and exceptions during initialization and testing
of the
node hardware.During initialization, the BIOS even tests to
verify
a generated interrupt is delivered correctly. This is a
serious
condition and should not be ignored by pressing ^C. The
specific
interrupt received is the sub-code displayed. The
interrupt number
will be less than 0x20.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Review previous output lines to determine
whether interrupts were just enabled (it
follows the CPU identification). You should
see a message:
--- This interrupt was expected
If this is not present, then most likely
the interrupt or exception occurred immediately
after being enabled.
B) Using Whack, you can manually enable and disable
interrupts with the "cpu interrupt enable"
and "cpu interrupt disable" commands. You can
also use the "cpu interrupt <num>" command to
generate an interrupt. If interrupts are
enabled, you should see a message upon generating
an interrupt. One of:
--- This interrupt was expected
or
*** Error: Expected interrupt xxxx but got yyyy
or
*** Error: CPU exception detected: Stopping execution.
The two former messages will only occur if the
BIOS is still expecting an interrupt to be
delivered. The later message will only be
displayed if the interrupt is numbered 0x20 or
higher.
Table Continued
216
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 12, subcode 0x0 (0)
Description
UNEXPECTED_INTERRUPT"Unexpected Interrupt"
PIII or P4 node:
--- SMI: No known cause (# zz)
GPE status: yyyyyy, GPE input: zzzzzz
An SMI is a System Management Interrupt, and interrupt
generated by the node hardware for the BIOS to service a
particular failure. This error indicates the BIOS was
unable to determine the cause of the SMI delivered by
hardware.
See Code 11 for resolution information.
Fatal error: Code 12, subcode 0x0 (0)
UNEXPECTED_INTERRUPT"Unexpected Interrupt"
Ironman, Tinman, Titan, or Atlas nodes:
CPU0 SMI: Bootstrap
CPU0 SMI: Updating
CPU0 SMI: Updated
--- SMI: No known cause (# 1) on CPU6
SMSCS[0] = 0x00000000
...
ALT_GP_SMI_EN = 0xbfbf
ALT_GP_SMI_STS = 0x0000
TMP_STS= 0x00000000:88380000
TMP_INT= 0x00000000:00000001
This fatal error indicates the BIOS received an SMI, but
wasn't
able to determine which device caused the interrupt. In
this
example, the "Bootstrap," "Updating," and "Updated" messages
suggest the BIOS firmware was updated.
Resolution: A) Reboot the node.
B) Replace the node motherboard.
Fatal error: Code 12, subcode 0x1 (yyyy)
UNEXPECTED_INTERRUPT"Unexpected Interrupt"
*** Error: Expected interrupt xxxx but got yyyy
During initialization, the BIOS installs an interrupt
handler
to verify interrupts are delivered reliably. It then
generates
an expected interrupt.If an interrupt is delivered which is
not the same as the one expected, this error is displayed.
The interrupt number, yyyy, represents which interrupt
occurred.
See Code 11 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
217
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 13, subcode 0x0 (yyyy)
INTERRUPT_FAILURE "Interrupt Failure"
*** Error: Interrupt 0x20 could not be generated.
or
*** Error: Interrupt 0xff could not be generated.
During initialization, the BIOS installs an interrupt
handler
to verify interrupts are delivered reliably. It then
generates
a few expected interrupts.If the specific interrupt is not
delivered, this error is displayed. The interrupt number,
yyyy,
represents which interrupt should have been generated.
See Code 11 for resolution information.
Fatal error: Code 14, subcode 0x0 (0)
ECC_FAILURE "Control Cache ECC
The Whack "mem test ecc" command
the
main memory to ensure ECC memory
functioning.
If this test fails, this message
other messages giving details.
Failure"
performs an ECC test over
error correction is
is displayed, together with
Note: Running the "mem test ecc" command destroys some
memory
locations in the range of [0 .. 512 KB] and [1 MB .. just
below the top of SDRAM].Hence, executing this once Linux
has booted will cause it to fail if it is reentered.
If you see this failure often during BIOS initialization,
then the cause is likely a hardware problem. Specifically,
the error tells you that the hardware ECC error mechanism
is not working correctly. Changing CPU memory DIMMs may
solve the problem, but it's more likely a board failure.
Resolution: A) Ensure the North Bridge heatsink is firmly
attached.
B) Replace CPU DIMMs.
C) Replace bootstrap CPU.
D) Replace the node motherboard.
Table Continued
218
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 14, subcode 0x1 (1)
Description
ECC_FAILURE "Control Cache ECC Failure"
*** Error: Missing ECC SMI [80] <= 1, data 0 0. Copy 0 Now
0 mode 0
00
10
20
30
-
0f:
1f:
2f:
3f:
00
01
04
aa
00
ff
09
aa
00
00
08
0a
00
00
09
02
00
00
20
a8
00
00
09
00
00
00
10
00
00
ff
09
00
|
|
|
|
00
ff
18
00
00
ff
09
00
00
ff
00
00
00
ff
09
c0
00
ff
00
7b
40
ff
00
df
0c
ff
59
ff
00
ff
8e
ff
This error indicates the BIOS ECC hardware test could not
get the hardware to generate an ECC SMI in response to a
corrupted memory address. It possibly indicates a failing
DIMM or memory controller, or that memory timings are too
fast for the DIMMs present in the node.
See Code 14, sub-code 0x0 for resolution information.
Fatal error: Code 15, subcode 0x0 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
mailbox register xxxx changed inappropriately
(yyyy) != expected (zzzz)
register test:
FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
During POST, all present FCAL adapters are tested for
functionality.The HBA cards sometimes require a firmware
download for full capability. POST does not have access to
this firmware and will only test basic register access and
functionality.If the Register Test fails, POST will indicate
this error.
If the user continues past this error (^C), software will
log
the error and continue testing the other PCI cards (if
present).
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
B) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
219
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 15, subcode 0x1 (slot)
Description
PCI_FIBRE_FAILURE (<slot>)"PCI Fibre Failure"
controller memory xxxx value (yyyy) != expected (zzzz)
memory test:FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
During POST, all present FCAL adapters are tested for
functionality.The HBA cards sometimes require a firmware
download for full capability. POST does not have access to
this firmware and will only test basic functionality.
If the Onboard Memory Test fails, POST will indicate
this error.
If the user continues past this error (^C), software will
log
the error and continue testing the other PCI cards (if
present).
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
B) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
220
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 15, subcode 0x2 (slot)
Description
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
data bits possibly float: Bitxxxx-Bityyyy.
PCI walking bits:FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
During POST, all present FCAL adapters are tested for
functionality.The HBA cards sometimes require a firmware
download for full capability. POST does not have access to
this firmware and will only test basic functionality.
If the PCI Fibre Card Bus Test fails, POST will indicate
this error.
If the user continues past this error (^C), software will
log
the error and continue testing the other PCI cards (if
present).
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
C) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
221
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 15, subcode 0x3 (slot)
Description
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
data bits possibly float: Bitxxxx-Bityyyy.
CM0 walking bits: FAIL
(slot) = PCI slot number
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
This test indicates a problem was observed with the fibre
channel card talking with the Cluster Manager.
If the "fibre test pci" test passed, then this problem is
likely
in the interface to the CM or CM memory.
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
C) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Table Continued
222
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 15, subcode 0x4 (slot)
Description
PCI_FIBRE_FAILURE (<slot>)
PCIe EYE test: FAIL
(slot) = PCI slot number
"PCI Fibre Failure"
There are 6 or 9 PCI slots available to insert PCI adapter
cards
on the Node Board.The slots are numbered 0-6 from left to
right
when looking at the front of the P4 Ealge and Ironman
Nodes. The
slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the
top
three will depend on which slot the node is in.
If the "fibre test cm" test passed, then this problem is
likely
in the PCIe to PCIE link between teh card and the switch.
Resolution: A) Reseat the failing PCI Fibre Adapter.
B) Analyze other failures in the system. If the
CM PCI XCB test passed, replace the PCI Fibre
Adapter.
C) Replace the node motherboard.
Diagnostic: A) Whack "fibre" and "pci" commands communicate
with each PCI Fibre Card. Refer to the slot
that produced the error for further diagnostic
information and procedure.
Fatal error: Code 15, subcode 0x10 (slot)
Fatal error: Code 15, subcode 0x11 (slot)
Fatal error: Code 15, subcode 0x13 (slot)
Fatal error: Code 15, subcode 0x14 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
BIOS can not make LSI card go into Operational state.
Resolution: A) Replace card. Send failed card back for FA.
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
HBA card register test failure
Resolution: A) Replace card. Send failed card back for FA.
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
LSI card register memory copy test failure.
Resolution: A) Replace card. Send failed card back for FA.
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
LSI card register memory copy test failure.
Resolution: A) Replace card. Send failed card back for FA.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
223
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 15, subcode 0x15 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
Firmware rev xxxx not supported. Upgrade to yyyy
LSI card does not contain 3PAR-approved firmware. If you
need
to run with an LSI card which has an older firmware
(engineering
only), you can set the "lsi_downrev" flag in the
BIOS.Example:
Whack> set perm lsi_downrev
Resolution: A) Replace card.
upgrade.
Fatal error: Code 15, subcode 0x16 (slot)
PCI_FIBRE_FAILURE (<slot>)
Unable to get firmware rev
Send failed card back for
"PCI Fibre Failure"
Attempting to get the firmware version from the LSI card
failed.
Resolution: A) Cycle power on the node.
B) Replace card. Send failed card back for FA.
Fatal error: Code 15, subcode 0x17 (slot)
PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure"
Manufacturing test for E200 node Only.
This error occurs when the onboard LSI chips are not found.
They
are expected to be in slot 0 and 3, with two devices on
each slot.
Resolution: A) Cycle power on the node.
B) Replace motherboard.
Fatal error: Code 17, subcode 0x0 (0)
IDE_FAILURE "Internal Drive Failure"
The IDE controller failed its internal self test.
Resolution: A) Replace the IDE or SATA boot drive.
B) Replace the IDE or SATA cable.
C) Replace the node motherboard.
Diagnostic: A) Whack "ide test" commands may be used to
individually execute IDE tests.
Fatal error: Code 17, subcode 0x1 (0)
Fatal error: Code 17, subcode 0x2 (0)
IDE_FAILURE "Internal Drive Failure"
The IDE controller failed to perform a self test.
See Code 17, sub-code 0x0 for resolution information.
IDE_FAILURE "Internal Drive Failure"
IDE register xx value (yyyy) != expected (zzzz)
The IDE register test failed during a pattern test.
See Code 17, sub-code 0x0 for resolution information.
Table Continued
224
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 17, subcode 0x3 (0)
IDE_FAILURE "Internal Drive Failure"
IDE register xx value (yyyy) != expected (zzzz)
The IDE register test failed during a walking bit test.
See Code 17, sub-code 0x0 for resolution information.
Fatal error: Code 17, subcode 0x4 (0)
IDE_FAILURE "Internal Drive Failure"
There was an IDE failure in data requested by the operating
system bootstrap. It is possible that data on the disk has
become corrupt to the point the operating system will not
successfully load.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x5 (0)
IDE_FAILURE "Internal Drive Failure"
Communication with the IDE interface timed out. This error
indicates the drive is not responding to commands within an
acceptable amount of time.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x6 (0)
IDE_FAILURE "Internal Drive Failure"
IDE reported a failure in read verify command.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x7 (0)
IDE_FAILURE "Internal Drive Failure"
A timeout (10 seconds) was detected while performing DMA
operation.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x8 (0)
IDE_FAILURE "Internal Drive Failure"
An error condition was detected while performing DMA
operation.
Resolution: Replace the IDE or SATA boot drive.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
225
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 17, subcode 0x9 (xx)
Description
IDE_FAILURE "Internal Drive Failure"
IDE power up: Unknown error
ERROR : 80
SECCNT: 80
SECNUM: 80
CYLLOW: 80
CYHIGH: 80
DEVSEL: 80
ALT_STATUS: 80
Drive: BUSY
The IDE drive had a failure at poweron reset which prevents
it from communicating with the chipset IDE controller.
Resolution: A) Cycle power on the node.
B) Reseat drive cable on both node and drive.
C) Replace the IDE or SATA boot drive.
D) Replace the node motherboard.
Diagnostic: A) Try using "ide reset" followed by "ide init"
to
clear the error.
B) The I/O address of the register which could trigger
this error at "ide init" is located at 0x1f1.
Try using "io inb 1f1" and "io outb 1f1 <value>" to
diagnose further.
Table Continued
226
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 17,
sub-code 0x10 (data)
IDE_FAILURE
"Internal Drive Failure"
A disk SMART threshold was triggered.
an
imminent boot drive failure.
This would indicate
Resolution: Replace the IDE or SATA boot drive.
Diagnostic: The data value may be used to determine the
specific
SMART field which caused the alert. Examples:
0 - Unknown
1 - Raw Read Error Rate
2 - Throughput
3 - Spinup Time
4 - Start / Stop Count
5 - Reallocate Sector Count
6 - Read Channel Margin
7 - Seek Error Count
8 - Seek Time
9 - Poweron Hours
10 - Spin Retry Count
11 - Calibration Retry Count
12 - Power Cycle Count
192 - Poweroff Retract Count
193 - Load Cycle Count
194 - Temperature Celsius
195 - Hardware ECC Recovered
196 - Reallocate Event Count
197 - Current Pending Count
198 - Offline Scan UE Count
199 - UDMA CRC Error Count
200 - Write Error Count
201 - Off Track Error Count
202 - DAM Error Count
203 - Run Out Cancel
204 - Raw Read Error Count
205 - Thermal Asperity Count
207 - Spin High Current Count
208 - Spin Buzz Count
209 - Offline Seek Performance
The "ide smart status" command may be used to display
the current SMART status fields.
Fatal error: Code 17, subcode 0x11 (0)
IDE_FAILURE "Internal Drive Failure"
IDE SMART self-test failed. The drive failed to finish a
built-in self-test.
Resolution: Replace the IDE or SATA boot drive.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
227
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 17, subcode 0x12 (0)
Description
IDE_FAILURE "Internal Drive Failure"
Drive failed to collect SMART data. The data is vital for
the drive
to determine SMART trigger.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x13 (0)
IDE_FAILURE "Internal Drive Failure"
Drive refused to accept SMART commands.
Resolution: Replace the IDE or SATA boot drive.
Diagnostic: Use "ide smart enable" to turn on SMART before
issuing
more SMART commands.
Fatal error: Code 17, subcode 0x14 (0)
IDE_FAILURE "Internal Drive Failure"
The SMART command issued to drive has incorrect syntax.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x15 (0)
IDE_FAILURE "Internal Drive Failure"
The SMART commands failed to write or read attributes.
Resolution: Replace the IDE or SATA boot drive.
Non-fatal error: Code 17,
sub-code 0x16 (0)
IDE_FAILURE "Internal Drive Failure"
No IDE device was found.
Resolution: A) Install or replace the IDE or SATA drive.
B) Replace the node motherboard.
Fatal error: Code 17, subcode 0x18 (0)
IDE_FAILURE "Internal Drive Failure"
The IDE controller failed the BIOS interrupt test, possibly
due to a bad drive.
See Code 17, sub-code 0x0 for resolution information.
Table Continued
228
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 17,
sub-code 0x19 (0)
IDE_FAILURE "Sequential DMA read timed out"
DMA xfer error code xxxx
The drive DMA test failed due to a timeout. Although each
sequential DMA read operation is succeeding, the total test
time
was exceeded. The likely cause of this failure is a drive
which
is having to perform a large number of relocations due to
failed
sectors, or a drive interface failure which only shows up
under
stress.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x20 (0)
IDE_FAILURE "Internal Drive Failure"
Drive did not return status to host after a command within a
reasonable amount of time.
Resolution: Replace the IDE or SATA boot drive.
Fatal error: Code 17, subcode 0x21 (rpm)
IDE_FAILURE "Internal Drive Failure"
*** Error: Boot drive is not a Solid State Disk (SSD).
This error occurs when the disk drive for a harrier system
is
not a SSD disk drive type.
Resolution: A) Replace the SATA drive with a SSD drive.
Fatal error: Code 17, subcode 0x22 (disk size)
IDE_FAILURE "Internal Drive Failure"
*** Error: Disk Size (XXX.X GB) is less than 128 GB.
This error occurs when we have 32 GB or less of cluster
memory
and the disk drive is less than 128 GB. This is because the
disk is not large enough for the memory dumps if the node
panics.
Resolution: A) Replace the SSD drive with a drive of at
least
128 GB.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
229
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 17, subcode 0x23 (disk size)
IDE_FAILURE "Internal Drive Failure"
*** Error: Disk Size (XXX.X GB) is less than 256 GB.
This error occurs when we have more than 32 GB of cluster
memory
and the disk drive is less than 256 GB. This is because the
disk is not large enough for the memory dumps if the node
panics.
Resolution: A) Replace the SSD drive with a drive of at
least
256 GB.
B) Reduce cluster memory to 32 GB or less.
Fatal error: Code 17, subcode 0x30 (0)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
Resolution: Replace the IDE or SATA boot drive.
Non-fatal error: Code 17,
sub-code 0x40 (xxxxxxxx)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
xxxxxxxx, AHCI Port Status register, for lab debug
Resolution: TODO
Non-fatal error: Code 17,
sub-code 0x41 (xxxxxxxx)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
xxxxxxxx, AHCI Port Error register, for lab debug
Resolution: TODO
Non-fatal error: Code 17,
sub-code 0x42 (xxxxxxxx)
IDE_FAILURE "Internal Drive Failure"
Drive returned an error status after command execution.
xxxxxxxx, AHCI Port TFD register, for lab debug
Resolution: TODO
Table Continued
230
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 18,
sub-code zzzz (0)
BIOS_INT_UNIMPLEMENTED"BIOS Int Unimplemented"
*** Real-mode BIOS interrupt: xxxx(error: yyyy)
This error most commonly indicates a bad or missing boot
area of
the USB disk. Customer Service node-disks or node spares
(FRUs)
might not be shipped with an operating system.Attempting to
boot
from one of these disks without first installing the system
software might produce this error message.From the Whack
prompt,
use the "boot net install" command to install the system
software.
In order for Linux to boot, LILO must load the kernel
image. It
needs assistance from the BIOS in order to perform this
task.
Linux also acquires some information from the BIOS using 16
bit
BIOS interrupts. CBIOS automatically accepts and emulates
traditional 16 bit BIOS interrupts to support these methods.
If LILO or Linux triggers an interrupt which is not
supported
by CBIOS, this possibly fatal error will result. There are
many
obsolete BIOS facilities which are not supported by CBIOS.In
some cases, the system boot may be able to continue after
this
error.
The sub-code and minor code indicate the specific BIOS
interrupt
called and the eax register parameter value. This
information
may be useful to Engineering.
Resolution: A) Reboot.Attempt to reproduce the problem.
B) Reinstall system software on the disk.
This may require a "boot net install" in
order to reinstall the operating system.
C) There may be a bug in the OS you are using or
it has been misconfigured. Confirm this version
of the OS has been verified to work on a 3PAR
node board.Or, temporarily swap system disks
with a known good system disk.
D) Replace the boot drive and reinstall the system
software.
E) Replace the node motherboard.
Diagnostic: A) Look up the displayed Real-mode BIOS
interrupt
number in a BIOS index to determine the facility
the software is requesting.This may provide
you a clue as to the cause.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
231
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
B) Use the Arium to single step through the return
from the 16 bit handler to the originating code
to determine what code is involved with the
unimplemented BIOS operation.
Fatal error: Code 19, subcode 0x0 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
Booting from SATA IDE...
No IDE or USB drives present or boot sector is invalid.
or
Booting from SATA IDE (bootdev)...
No IDE drive present or boot sector is invalid.
or
Booting from PATA IDE...
No IDE drive present or boot sector is invalid.
or
Booting from USB...
No USB drive present or boot sector is invalid.
The IDE (PATA or SATA) or USB Flash disk is used for
booting the
operating system. This error indicates no a drive was
found during
during a hardware probe, but it was found to not be
boootable.
Resolution: A) Cycle power on the node.
B) Verify disk power and data cables are connected to
both the drive and the motherboard.The red stripe
on the IDE data cable must be oriented closest to
the power connector on the drive.
C) Replace the disk power cable and/or data cable.
D) Replace the drive.
E) Replace the node motherboard.
Diagnostic: A) Reset and enter Whack with ^W after the PCI
bus
scan but before the IDE probe. You should be
able to use the "ide init" command to probe for
a disk.Minimal output should include drive
Capacity and Geometry (C/H/S: cylinder/head/sector).
B) If the above information is available, use the
"ide read" command to read a sector into CPU memory
and verify it was read.Example:
Whack> ide read 1000 0 1
Whack> d 1000 200
You should see the contents of sector 0, which
(with a previously initialized node disk) will
include the string "LILO" starting at byte
offset 6.
Table Continued
232
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 19, subcode 0x1 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
IDE TIMEOUT waiting for DRDY
The IDE disk is used for booting the operating system.
This error indicates there was a problem communicating with
the IDE controller, most likely due to a missing IDE hard
drive, a disconnected cable, or a failed IDE hard drive.
See Code 19, sub-code 0x0 for resolution information.
Fatal error: Code 19, subcode 0x2 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
IDE TIMEOUT waiting for DRQ
The IDE disk is used for booting the operating system.
This error indicates that a command was issued to the IDE
disk (read sectors) but the drive controller did not report
back with the data within a reasonable amount of time.This
may be caused by a failed sector or IDE controller failure.
See Code 19, sub-code 0x0 for resolution information.
Fatal error: Code 19, subcode 0x3 (0)
CANT_READ_BOOT_BLOCK"Can't Read Boot Block"
IDE ERROR reading sector xxxx
The IDE disk is used for booting the operating system.
This error indicates that a command was issued to the IDE
disk (read sectors) but the drive controller reported that
there was a error in reliably retrieving the requested
sectors. This error may be caused by a failed sector or
IDE controller failure.
See Code 19, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
233
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 20, subcode 0x0 (0)
Description
AP_INIT_FAILURE "AP Init Failure"
*** Error: Failed to deliver startup message to CPU xxxx
or
*** Error: Errors in APs starting up.
If a board has more than a single CPU, only one CPU comes
out of power-on executing code. The other waits in a halted
state for an AP message from the bootstrap processor. All
MP-capable Pentium processor has an onboard Advanced
Programmable Interrupt Controller called the Local APIC
(there
is a complementary component called the IOAPIC located on
the
motherboard). Once the bootstrap processor has completed
all
node board initialization and testing, it starts up each
application processor (which in Intel terms is defined as
any
processor other than the initial bootstrap processor).Each
AP then does a brief identify, verify, and microcode update.
In the above case, if the local APIC fails deliver an AP
startup to the other processor within a reasonable amount of
time, this error will result. In a single CPU system this
error should not occur because an earlier probe should
identify no AP processor is present. If the Local APIC
cannot reliably deliver a message over the IOAPIC, then it
is probably not safe to ignore this error by pressing ^C.
Resolution: A) Reseat both processors in their sockets.
B) Replace each processor individually. Do not
bother with downgrading to a single processor
system since this is a multiprocessor startup
issue. The problem processor will not be
apparent with a single processor configuration.
C) Replace the node motherboard.
Diagnostic: A) Use Arium as bootstrap processor and verify
that
APIC message is being delivered to the bus.
B) Use Arium as application processor and verify that
APIC message is delivered from the IOAPIC on
the motherboard. The application processor should
then start executing code at the default APIC
address of 0x30000 (FIRST_SMM_BASE).
Table Continued
234
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 20, subcode 0x1 (0)
Description
AP_INIT_FAILURE "AP Init Failure"
*** Error: Startup message successfully sent to CPU xxx, no
response
After an AP startup message has been delivered to the
application processor through the IOAPIC, the bootstrap
processor waits for an indication the AP has started.
If the indication is not received before a reasonable
timeout, this error is given. It should be ok to ignore
this message by pressing ^C and continue with further
BIOS diagnostics.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 20, subcode 0x2 (0)
AP_INIT_FAILURE "AP Init Failure"
*** Error: CPU xxxx failed to complete initialization.
Once the application processor (AP) has started
initialization,
it sets a flag that the bootstrap processor can use to
determine
when the bootstrap processor has completed. If the AP
remains
in the AP_INIT_START state too long, this fatal error is
displayed.It is probably not safe to resume after this error
since the AP may be off executing errant code or interfering
with bootstrap processor bus cycles.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 20, subcode 0x3 (0)
AP_INIT_FAILURE "AP Init Failure"
*** Error: POST failure on CPU xxxx: yyyy
*** Error: CPU xxxx initialization failure.
The application processor (AP) previously failed to complete
a Built In Self Test (BIST). This is likely due to a bad
processor.
Resolution: A) Replace the application processor.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
235
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 20, subcode 0x4 (0)
Description
AP_INIT_FAILURE "AP Init Failure"
*** Error: Invalid CPU for CPU xxxx, error code: yyyy
*** Error: CPU xxxx initialization failure.
During application processor (AP) initialization, it
verifies
that the CPU model, stepping, and clock multiplier which is
being initialized matches those values of the bootstrap
processor.If they do not match, this error will result.
Resolution: A) Since the processors are possibly mismatched,
remove the heatsink on both and verify that
the CPU model and stepping are identical.
See Code 20, sub-code 0x0 for more resolution information.
Fatal error: Code 20, subcode 0x5 (0)
AP_INIT_FAILURE "AP Init Failure"
*** Error: More than wwww CPUs in system.
*** Error: CPU xxxx initialization failure.
The currently supported node board hardware configuration
is a maximum of two physical processors. The BIOS uses this
knowledge to limit the possibility of repeat initialization
of the application processor (AP).If this message occurs,
it may be due to a variety of hardware problems, but most
suspect is the application processor.
See Code 20, sub-code 0x0 for resolution information.
Table Continued
236
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 21, subcode 0x0 (0)
Description
SMI_SETUP_ERROR "SMI Setup Failure"
*** SMI setup error: Not expecting to install a vector on
CPU xxxx
Intel processors support an interrupt level called SMI
(System
Management Interrupt) which is used for hardware management
(usually by the BIOS).Events such as power management and
hardware errors usually trigger an SMI. When an SMI is
triggered, the system enters SMM (system management mode).In
a multiprocessor system, both processors are usually
triggered
by an SMI at the same time. Since both processors may
attempt
to service an SMI at the same time, each processor must
have a
unique stack area where to dump processor context.SMI setup
configures each processor individually with a unique stack
address for SMI handling.
This particular error indicates that the SMI setup handler
has detected a stack setup SMI, yet one was not expected
(because one had already been set up or CPU initialization
had not yet reached the point of SMI setup). The bootstrap
CPU delivers the setup SMI to itself and to the application
processor.This error could be caused by a faulty CPU or
motherboard. The CPU which reports the setup error may not
be the one at fault.
Resolution: A) Pull one processor at a time to determine if
the problem is reproducible with a single CPU.
B) Swap CPUs to see if the exact problem moves with
CPU. If not, it may be the motherboard.
C) Individually replace both CPUs.
D) Replace the node motherboard.
Diagnostic: A) Use Arium as bootstrap processor and verify
that
the SMI is being delivered.
Fatal error: Code 21, subcode 0x1 (0)
SMI_SETUP_ERROR "SMI Setup Failure"
*** SMI setup error: CPU xxxx not found in CPU table
During SMI setup, each processor in turn receives an SMI and
then performs stack initialization. Prior to the SMI setup,
all application processors wait in a halted state for an
APIC
message to identify and download microcode. If the processor
performing an SMI setup detects that it had not previously
executed and added its CPU ID to the system table, then this
fatal error will be displayed.
See Code 20, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
237
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 21, subcode 0x2 (0)
SMI_SETUP_ERROR "SMI Setup Failure"
*** SMI setup error: CPU xxxx did not respond
During SMI setup, each processor in turn receives an SMI and
then performs stack initialization. This error indicates
that the bootstrap processor issued an SMI through the APIC
and it was not processed by the targeted processor. This
indicates that either SMIs are not being delivered properly,
or that the targeted processor may be defective.
See Code 20, sub-code 0x1 for resolution information.
Fatal error: Code 22, subcode 0x0 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `cbios_to_os_message' test, expected xx but
got yy
CBIOS provides service to the 3PAR kernel through a special
command queue.Responses are returned to the OS through
another queue, which is tested during BIOS initialization.
Sub-code 0x0 indicates that the CBIOS to OS queue did not
pass the built-in test.
Resolution: A) Pull one processor at a time to determine if
the problem is reproducible with a single CPU.
B) Swap SDRAM with good SDRAM.
C) Update CBIOS to the latest version.
D) Replace the node motherboard.
Fatal error: Code 22, subcode 0x1 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `os_read_message_test', failed to read message
This error indicates that the CBIOS to OS queue test failed
to acquire a message it previously sent.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 22, subcode 0x2 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `os_read_message_test':
expected: uuuu vv `ww' but got: xxxx yy `zz'
This error indicates that the CBIOS to OS queue test failed
because the message received did not match the message sent.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 22, subcode 0x3 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: In `os_read_message_test', expected no more data
This error indicates that the CBIOS to OS queue test failed
because there were more items in the queue than those sent.
See Code 20, sub-code 0x0 for resolution information.
Table Continued
238
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 22, subcode 0x4 (0)
Description
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: Couldn't send simulated message from OS to CBIOS,
code == xx
This error indicates that the OS to CBIOS queue test failed.
The minor code will indicate to an engineer what went wrong.
See Code 20, sub-code 0x0 for resolution information.
Fatal error: Code 22, subcode 0x5 (0)
CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure"
*** Error: Inconsistent queue: queue_base == ww,
queue_limit == xx
queue_inp = yy, queue_otp = zz
This error indicates that the CBIOS to OS queue test failed
because the queue pointers became corrupt.
See Code 20, sub-code 0x0 for resolution information.
Non-fatal error: Code 23,
sub-code 0x0 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
CRC mismatch for failsafe CBIOS
Upon startup, CBIOS computes a strong CRC over all
executable
code and data stored in the flash.This is done to guard
against
flash corruption which also ensures reliable system
initialization
and testing. This specific sub-code indicates that a CRC
error
was detected in the failsafe component of CBIOS. The
majority of
the failsafe is only executed if corruption is detected in
the
main CBIOS.
Resolution: A) Try pressing ^C to resume. Perform a flash
update as soon as possible.If flash updating
under Linux, make sure to specify the 'failsafe'
option to update the failsafe area as well.
B) If the flash update is successful, but you
still get a CRC error, verify that your flash
image is intact. The Linux flash utility does
this automatically using the same strong CRC
algorithm as the BIOS uses.
C) Replace the node motherboard.
Diagnostic: A) Use the Whack "net tftp" command to download
an identical image to that which is in flash.
Use the Whack "mem compare" command to
locate bytes which differ so that you may
examine those values with "d <addr>"
B) If Whack is not available, use the Arium to
look at flash address space for defects. It
may be a stuck, floating, or bridged address
or data line.
C) Replace the flash part.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
239
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 23,
sub-code 0x1 (0)
Description
FLASH_CRC_ERROR "Flash CRC Failure"
Invalid entry point for full CBIOS
Boot with clustering disabled and update flash immediately!
Prior to starting up the non-failsafe (full diagnostic)
CBIOS
image, the failsafe CBIOS performs some consistency checks
over the image. This error indicates corruption was detected
in the entry point to the main routine of the full CBIOS.
If you are have recently installed a new CBIOS which is
larger
than the previous, it is possible to get this error because
the failsafe BIOS present cannot properly verify the larger
size BIOS.
Resolution: A) Try pressing ^C to resume. Perform a flash
update as soon as possible.Boot with clustering
disabled by typing "tpd nokmod" at the LILO prompt.
Once the node has booted, login as root and use
the flash command. Example:
# flash /opt/tpd/bios/bios-1.9.4
Upon completion of the flash update, reboot and
observe console messages to ensure the CRC error
no longer occurs.
B) If the flash update is successful, but you
still get this error, verify that your flash
image is intact. The Linux flash utility does
this automatically using the same strong CRC
algorithm as the BIOS uses.
C) Replace the node motherboard.
Diagnostic: A) If Whack is not available, use the Arium to
look at flash address space for defects. It
may be a stuck, floating, or bridged address
or data line.
B) Replace the flash part.
Fatal error: Code 23, subcode 0x2 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
Invalid magic for full CBIOS
Prior to starting up the non-failsafe (full diagnostic)
CBIOS
image, the failsafe CBIOS performs some consistency checks
over the image. This error indicates the failsafe BIOS could
not find a proper header record for the full CBIOS.
See Code 23, sub-code 0x1 for resolution information.
Table Continued
240
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 23, subcode 0x3 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
CRC mismatch for full CBIOS
Prior to starting up the non-failsafe (full diagnostic)
CBIOS
image, the failsafe CBIOS performs a strong CRC over the
full
CBIOS image to verify the image's integrity. This error
indicates the full CBIOS had a CRC failure.
See Code 23, sub-code 0x1 for resolution information.
Fatal error: Code 23, subcode 0x4 (0)
FLASH_CRC_ERROR "Flash CRC Failure"
Failsafe CBIOS is now enabling the full CBIOS
...
The full CBIOS either detected an error or user input (the
'f' key) which forced it to return to the failsafe BIOS.
If the user did press the 'f' key, then press ^C to resume
startup under the failsafe BIOS. If the user did not press
the 'f' key, browse prior messages to learn of a failure
which may have caused this error.
Resolution: A) If the error was not the result of a
keystroke,
try pressing the 'n' key at BIOS startup to
clear any initialization skips.It may be
recorded in NVRAM to skip the full BIOS version
and always execute the failsafe.
See Code 23, sub-code 0x1 for more resolution information.
Non-fatal error: Code 23,
sub-code 0x10 (bbxxyyzz)
FLASH_CRC_ERROR "EOS: Repairing Main BIOS"
The EOS Main BIOS image in SPI has failed to boot and the
FPGA
watchdog has reset the node to boot from the failsafe BIOS.
The
failsafe BIOS has detected a bad CRC in the main BIOS
region of
flash and is attempting to automatically re-flash that
region
from disk.
The data field contains the build (bb) and version
(xx.yy.zz) of
the Main BIOS that failed to boot.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
241
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 23, subcode 0x11 (bbxxyyzz)
Description
FLASH_CRC_ERROR "EOS: Main BIOS Corrupt"
The EOS Main BIOS image in SPI has failed to boot and the
FPGA
watchdog has reset the node to boot from the failsafe BIOS.
The
failsafe BIOS has detected a bad CRC in the main BIOS
region of
flash.The failsafe BIOS has also detected five or more
attempts
to automatically recover the Main BIOS within the past two
hours
and has stopped attempting automatic recovery.
The data field contains the build (bb) and version
(xx.yy.zz) of
the Main BIOS that failed to boot.
Fatal error: Code 24, subcode 0x0 (ptr)
TURD_EXCEEDED_LIMIT "TURD Exceeded Limit"
*** Error: MP turd exceeded 0x100000
The BIOS presents to the operating system a set of tables
which
describe the hardware present in the system. These tables
have
a rigid structure for each type of device.If the CBIOS
configuration structure becomes corrupt, this error may
result
when the TURD structures are initialized for the operating
system. A consistency check ensures the TURD area does not
go beyond 1 MB (which is the base address where the
operating
system normally begins using main memory).The data to this
error is the pointer address reached, and will be greater
than 0x100000.ptr is the value which exceeded 0x100000.
Resolution: A) Remove cards from all PCI slots. If the
error no longer occurs, it may be a
hardware failure on one of cards.
B) Replace the node motherboard.
Diagnostic: A) Look at memory starting at 0x000f0000.
0x5f504d5f is the magic number of the first
first TURD (the MP Configuration table).
B) Turn on PRINTING_TURD and DEBUG_APIC compile flags.
Table Continued
242
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 24, subcode 0x1 (0)
TURD_EXCEEDED_LIMIT "TURD Checksum Failure"
*** Error: MP table checksum failed - stopping table build
The BIOS presents to the operating system a set of tables
which
describe the hardware present in the system. In this case,
the
BIOS detected that one of the tables had a bad checksum.
Resolution: A) Remove cards from all PCI slots. If the
error no longer occurs, it may be a
hardware failure on one of cards.
B) Replace the node motherboard.
Fatal error: Code 24, subcode 0x2 (0)
TURD_EXCEEDED_LIMIT "TURD Exceeded Limit"
*** Error: Too many MP table entries - stopping table build
The BIOS presents to the operating system a set of tables
which
describe the hardware present in the system. In this case,
the
BIOS detected that it had added too many entries to the
table,
likely because too many PCI devices are present in the
system.
This error is likely due to an earlier PCI failure.
Resolution: A) Remove cards from all PCI slots. If the
error no longer occurs, it may be a
hardware failure on one of cards.
B) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
243
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 25, subcode 0x0 (0)
Description
PROM_FAILURE"PROM Failure"
The node board has two different Serial EEPROM devices used
for storing persistent board information. One PROM device
is located on the I2C bus.It stores node board
manufacturing,
assembly, serial number, and error message log information.
The second PROM device is connected through the Intel
82559ER
ethernet controller. It stores ethernet controller
information
such as initialization state and the hardware MAC address.
PROM checksum: FAIL
The PROM which stores node board manufacturing, assembly,
serial number, and error message log information does
not have a valid checksum.If the PROM has not yet been
initialized or if it has become corrupt, you may see this
error.
Resolution: A) Press ^W to enter Whack and use either
"prom init" or "prom edit" to correct this
error.
B) If the information looks correct with
"prom id" then try using "prom checksum" to
rewrite the checksum.
C) Replace the node motherboard.
Diagnostic: A) Use the Whack "d prom <addr>" command to
display PROM contents. Use the Whack
"c prom <addr>" command to change PROM
contents. Look for a pattern in order to
determine if the error is due to the device's
connection with the motherboard or a hardware
failure within the Serial PROM.
Table Continued
244
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 25, subcode 0x1 (0)
PROM_FAILURE"PROM Failure"
Ethernet 0 PROM checksum: FAIL
The PROM which stores ethernet controller information does
not have a valid checksum.If the PROM has not yet been
initialized or if it has become corrupt, you may see this
error.
Resolution: A) Press ^W to enter Whack and use "prom id" to
verify the other PROM is valid.If not, first
use "prom init" or "prom edit" to set the
PROM information. If the PROM information appears
valid, use "prom mac" to reprogram the
Ethernet MAC address and checksum.
B) Try flushing out a correct checksum.
Note: You must first select the device with an
error using the "eth dev" command.
Example:
Whack> eth dev 1
Whack> eth checksum
C) Replace the node motherboard.
Diagnostic: A) Try programming a custom MAC address.
Example: "prom mac 00:02:AC:00:00:43"
B) Use the Whack "d eth <addr>" command to
display PROM contents. Use the Whack
"c eth <addr>" command to change PROM
contents. Look for a pattern in order to
determine if the error is due to the device's
connection with the motherboard or a hardware
failure within the Ethernet PROM.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
245
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 25,
sub-code 0x2 (0)
Description
PROM_FAILURE"PROM Failure"
Ethernet MAC xx:xx:xx:xx:xx:xx mismatches PROM:
yy:yy:yy:yy:yy:yy
The "prom mac" command may fix this.
This error indicates the MAC address stored in the onboard
Ethernet controller's PROM does not match that which can be
computed from the board revision and serial number stored
in the node's PROM. This mismatch suggests that one or the
other PROM may contain corrupt contents.
If the ethernet MAC address was purposely set to an address
(see "prom mac" command), then this check may be overridden
by setting the NVRAM "oddmac" flag. Example:
Whack> set perm oddmac
Resolution: A) Look for a prior message indicating an
invalid
board type or check the banner to ensure the board
type and serial number are correct for this node.
If either is not correct, use the 'prom edit'
command to repair the corruption.
B) Use the "prom mac" command to reprogram the
MAC address in the ethernet controller's PROM.
C) Replace the node motherboard.
Diagnostic: A) Determine if the cause is due to a failing
node
PROM or ethernet controller PROM. Use the
"db prom 0 20" command to display PROM contents
and compare with expected values. Example:
Whack> dbz8 prom 0 20
prom 0000: 00 04 09 20 10 03 04 35 . ...5
prom 0008: 30 53 4f 4c 01 10 00 00 0SOL..
prom 0010: 00 76 ff ff ff ff ff ff v......
prom 0018: ff ff ff ff c1 1f a4 5e .......^
Replace node PROM if it is defective.
B) Use the "db eth 0 20" command to display ethernet
PROM contents and compare with expected values.
Example:
Whack> dbz8 eth 0 20
eth 0000: 00 02 ac 14 00 76 03 01 ... v..
eth 0008: ff ff 01 00 01 07 00 00 ... ..
eth 0010: 10 00 04 03 40 48 00 00 . ..@H
eth 0018: 86 80 00 00 ff ff ff ff .. ....
Replace ethernet PROM if it is defective.
Table Continued
246
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 25,
sub-code 0x3 (0)
Description
PROM_FAILURE "PROM Failure"
During initialization, CBIOS checks the prom for magic
number.
If the magic number test fails. This non-fatal logs when the
magic number check fails.
Resolution: A) On EOS platforms, the midplane, node type,
slot
id may need to be reconstructed with prom edit.
The Ethernet MAC and PROM magic number may also
need to be reconstructed. (Bug 82094)
B) Previous platforms should be reinitialized and
reconstructed automatically
Diagnostic: A) Use "db i2c 2.a6.0 100" to view the contents
of this region.Typically only the first 32
bytes are affected.
Non-fatal error: Code 25,
sub-code 0x4 (aabbccdd)
PROM_FAILURE"PROM Failure"
Board Spin value is invalid.
fix this.
The "prom edit" command may
This error indicates the board spin value in the prom
record is
not in the proper range. The range of the board spin byte
is
0x01 to 0x16. If the board spin number is out of this
range, then
this error will occur.
NOTE: On Tinman, the board spin field is not used as board
spin,
so this field will always be 0x17.On Tinman, this is NOT
flagged
as a error.
If the board spin field is not valid, then the BIOS used the
board revision field. This is a two character field that
must
be "01" to "09", then "A0" to "A9", then "B0" to "B9" etc.
If a character (A-Z) is in the secord byte or a non zero
number (1-9) is the first character, then this is an error.
In the data field, aa is the board spin value, bb is the
calculated
board revision, cc is the first character in the rev field,
and
dd is the second character in the rev field.
Resolution: A) Use "prom edit" to fix/verify the board spin
field.
B) Use "prom edit" to fix the board revision field.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
247
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 25,
sub-code 0x5 (0)
Description
PROM_FAILURE"PROM Failure"
EOS node prom value is invalid. The "prom edit" command may
fix this.
This error indicates EOS node prom value in the prom record
is
not in the proper range. The node type and midplane type
value
in prom should be programmed correctly with prom edit
command.
Resolution: A) Use "prom edit" to fix/verify the midplane
field.
B) Use "prom edit" to fix/verify the node type field.
Non-fatal error: Code 25,
sub-code 0x6 (0)
PROM_FAILURE"PROM Failure"
EOS Node ID in Prom and Slot ID do not match.
edit" command may fix this.
The "prom
This error indicates EOS Node ID prom value in the prom
record does
not match the Slot ID read from the fpga. The Node ID value
in prom should be programmed correctly with prom edit
command.
Resolution: A) Use "prom edit" to fix/verify the Node ID
field.
Non-fatal error: Code 26,
sub-code 0x1 (ethdev)
ETH_FAILURE "Ethernet Failure"
eth0 device self test:
FAIL All tests: xxxx (timeout)
During initialization, CBIOS has the ethernet controller
perform
an internal test to verify correct operation. If the
ethernet
controllerdoes not respond within a reasonable amount of
time,
this error will be displayed.
"ethdev" indicates the PCI Slot in
device is located.This is an ASCII
PCI slot 0. If the ethernet device
motherboard, then ethdev will have
which the failed ethernet
value, so 0x30 indicates
is located on the node
a value of 0x00.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Verify the 82559ER shows up in a PCI scan.
Use the Whack "pci find 8086" command. It
should display the 82559ER Ethernet controller.
B) Use the Whack "eth test" command to repeat
the test. Make sure that CBIOS initialization
has past the point of PCI scan.Use Whack
"loop ffff eth test" to repeat in a loop.
Table Continued
248
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 26,
sub-code 0x2 (ethdev)
ETH_FAILURE "Ethernet Failure"
eth0 device self test:
FAILxxxx yyyy
If the ethernet controller fails its internal test, this
error will be displayed. Since this is an internal test,
it is likely the ethernet controller itself which has
failed.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "eth test" command to repeat
the test. Make sure that CBIOS initialization
has past the point of PCI scan.Use Whack
"loop ffff eth test" to repeat in a loop.
Non-fatal error: Code 26,
sub-code 0x3 (0)
ETH_FAILURE "Ethernet Failure"
No ethernet devices available for loopback test
This error indicates that no ethernet devices could be found
or initialized on the node. This is possibly the result of a
hardware failure.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "eth test" command to make sure
that the low level test passes.
B) Try using "net dhcp" in an environment that has
a DHCP server to see if the node can send and
receive packets. If so, then this error is
likely caused by incorrect BIOS code.
Non-fatal error: Code 26,
sub-code 0x4 (0)
ETH_FAILURE "Ethernet Failure"
No loopback connections were found. An external loopback
plug is
required if this node has only one ethernet port. A crossover
cable is required if this node has more than a single
ethernet port.
Resolution: A) Make sure the ethernet loopback plug is in
the ethernet connector (you should see link
status lights illuminated).In the case of
a node having two ethernet ports, make sure
a crossover cable is connected between the
ethernet ports.
B) Cycle power on the node.
C) Replace the node motherboard.
Diagnostic: A) This problem is most likely caused by a bad
connector or bad connection to the loopback
plug. Make sure TX+ makes a circuit to RX+
and TX- makes a circuit to RX- on the PHY.
B) Try plugging into a normal ethernet to see
if it can talk to a DHCP server "net dhcp"
C) Try using "net loopback" to test the ethernet
port using the internal PHY loopback.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
249
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 26,
sub-code 0x5 (slotid)
ETH_FAILURE "Ethernet Failure"
eth2 loopback PHY internal:
FAIL
This error indicates that the internal loopback of the PHY
did
not correctly loop back packets. If the device being tested
is onboard the node (82559ER or 82551ER), then this is a
failure.
Some plug-in PCI boards (such as 82557) do not fully support
PHY loopback. Those devices will cause the following
warning:
eth2 loopback PHY internal:
Unavailable
No error stop will occur in the case of a PHY not supporting
internal loopback.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the "pci probe" command to match up the
ethernet devices with which one has failed.
B) Try using an external loopback to see if the
results are the same. If the same, then try
debugging using a scope. If the external
loopback works, then it may be that the PHY
loopback just does not work in this device.
Non-fatal error: Code 26,
sub-code 0x6 (slotid)
ETH_FAILURE "Ethernet Failure"
eth0 sends to eth1 but cannot receive from it
This is an unusual error in that one ethernet device is able
to reliably receive packets from the other, but the opposite
is not true.
Resolution: A) Run the test again.If the nodes are attached
to a hub, the failure may be due to another
ethernet node flooding the network.
B) Cycle power on the node.
C) Ensure that there is no a switch between
the ethernet ports.A switch may prevent the
test from functioning properly if the MAC
address of an interface is in use elsewhere
or the switch is really an IP router.
D) Ensure that there is no a switch between
the ethernet ports.A switch may prevent the
test from functioning properly if the MAC
address of an interface is in use elsewhere
or the switch is really an IP router.
Diagnostic: A) Test against a plug-in PCI ethernet card to
isolate which ethernet interface is not
functioning.
Table Continued
250
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 26,
sub-code 0x7 (slotid)
ETH_FAILURE "Ethernet Failure"
eth0 loopback wwwww: FAIL - receive timeout (xx seconds)
This error indicates the ethernet device did not
successfully
receive the loopback pattern sent to test the ethernet
device's
tranceiver. The failure to receive a loopback pattern
usually
means the ethernet device has failed.
"ethdev" indicates the PCI Slot in
device is located.This is an ASCII
PCI slot 0. If the ethernet device
motherboard, then ethdev will have
The following
to
see. If this
not
happened:
eth0 loopback
eth0 loopback
eth0 loopback
eth0 loopback
eth0 loopback
which the failed ethernet
value, so 0x30 indicates
is located on the node
a value of 0x00.
are normal test results that you would expect
error occurs, then one of the following has
All zeros:
PASS
All ones:
PASS
Walking ones:
PASS
Walking zeros:
PASS
Random pattern: PASS
This error indicates that within 100 packets successfully
transmitted, there were no packets successfully received.
Resolution: A) Cycle power on the node.
B) Unplug the network cable and run the test again.
If the node is attached to a hub, the failure
may be due to another ethernet node flooding the
network. This is not very likely.
C) If the ethernet device is located in a PCI
slot, replace the card.
D) Replace the node motherboard
Diagnostic: A) Test against a plug-in PCI ethernet card to
isolate which ethernet interface is not
functioning.
Non-fatal error: Code 26,
sub-code 0x8 (slotid)
ETH_FAILURE "Ethernet Failure"
eth0 loopback wwwww: Packet transmit failed
This error indicates that the ethernet device was not able
to
successfully transmit packets.This is really a serious
failure,
since the ethernet code will under any condition not fail to
transmit unless the ethernet device failed to initialize.
Resolution: A) Use "eth reset" to reset the ethernet device.
B) Cycle power on the node.
C) Replace the node motherboard if the failed
ethernet device is on the node.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
251
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 26,
sub-code 0x9 (slotid)
Description
ETH_FAILURE "Ethernet Failure"
eth0 loopback wwwww: FAIL - miscompare
stuck high=xxxx stuck low=yyyy toggle=zzzz
This error is displayed if one of the ethernet tests detects
a mismatch between the packet send and the data received.
It also includes a diagnostic line which is useful to see in
what way the data is different.
Resolution: A) Use "eth reset" to reset the ethernet device.
B) Cycle power on the node.
C) Replace the node motherboard if the failed
ethernet device is on the node.
Diagnostic: A) You can get complete packet dumps if you wish
to manually compare how the data was corrupted.
In order to do this, use "net loopback vv"
(double verbose).
B) If it is a single bit that is failing (or a
small number), observe if the bits are pulled
high or low. This may assist you in debugging
where the hardware is failing, if it is
external to the ethernet IC.
Non-fatal error: Code 26,
sub-code 0xa (slotid)
ETH_FAILURE "Ethernet Failure"
ethxxx device registers:FAIL
Onboard ethernet device did not read valid config from
EEPROM.
A powercycle might clear this failure if this is a new node.
This error indicates the ethernet device failed to
initialize
properly, probably because it read invalid content from the
attached EEPROM device. If this an onboard GigE on the 5000P
chipset (Tx00, Fx00, Vx00, Gx00), then it is likely this is
the first time the node has ever been powered on. Once the
BIOS
writes a configuration to the SPI EEPROM attached to the
GigE,
it is necessary for the board to be power cycled before the
GigE
device is usable. If the board is not new and you see this
failure,
then it's likely a component on the node motherboard has
failed.
Resolution: A) Power cycle the node.
B) Replace the node motherboard if the failed
ethernet device is on the node.
Table Continued
252
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 27,
sub-code 0x0 (#)
Description
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
Each node board has multiple temperature and voltage sensors
and fan RPM sensors which monitor the environment to ensure
the temperature, voltage, and fan RPM are within operating
tolerances. This directly results in increased reliability
of the product.
If a temperature or a voltage falls outside a programmed
tolerance level, CBIOS will alert the user to this
condition.
The sub-code displayed reflects the type of (the first)
error
detected. The data value is a count of the number of
temperature/voltage/fan problems detected.
A sub-code value of 0x0 indicates a fan RPM problem.
A sub-code value of 0x1 indicates a temperature problem.
A sub-code value of 0x2 indicates a voltage problem.
This particular sub-code indicates a programmed temperature
limit has been exceeded.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Verify the limit settings are reasonable. Use
the Whack "i2c env" command. The Whack
"i2c env defaults" command resets all defaults.
C) Verify both power supply fans are spinning
freely and that the supply amber failure light
is not illuminated.If only a single supply is
installed, make sure the second slot either has
a fan or is covered.
D) Replace the power supply.
E) If it's CPU temperature, verify the heatsink
is conducting heat well.
F) If it's CPU voltage, try swapping out the CPU
voltage regulators.
G) Replace the node motherboard.
Diagnostic: A) Use a voltage probe at appropriate vias to
verify correct voltage levels.
B) Verify LM87 external temperature sensor line is
well connected to the CPU's thermal diode.
Non-fatal error: Code 27,
sub-code 0x1 (#)
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates a programmed temperature limit
has been exceeded.
See Code 27, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
253
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 27,
sub-code 0x2 (#)
Description
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates a programmed voltage limit has been
exceeded.
See Code 27, sub-code 0x0 for resolution information.
Non-fatal error: Code 27,
sub-code 0x3 (0)
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates a sensor interrupt test failed.
See Code 27, sub-code 0x0 for resolution information.
Fatal error: Code 27, subcode 0x4 (0)
TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure"
This sub-code indicates that a CPU has asserted its
THERMTRIP_N
signal. This could mean that it has reached its case
temperature,
that a VRM has failed, or there is a problem with the FPGA.
Resolution: A) Check the environmentals.
B) Replace the node.
Non-fatal error: Code 27,
sub-code 0x5 (Shutdown
Code =1 or =2)
TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Pause"
For ShutdownCode = 1:
In a system wide over temperature condition, the OS will
shut down
the system and reboot the nodes. The BIOS will pause the
boot in a
low power state until the over temperature condition has
been
cleared for 30 minutes. When in this state BIOS samples
critical
temperature sensors periodically and displays the current
state of
those on the system console every few minutes.This delay
can be
cleared early by a node power cycle.
This log entry indicates the start of the BIOS boot pause.
For ShutdownCode = 2:
This shutdown code indicates an overtemperature faulure of
a single
node. TPD will flag this failure and shutdown that node.
The node
will not complete the boot until the unit has been repaired
and any
issues cleared.
To clear the boot halt, reboot the node and use the Whack
command
"unset tshutdown" before the POST reaches step 35.
Table Continued
254
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 27,
sub-code 0x6 (0)
Non-fatal error: Code 27,
sub-code 0x7 (0)
Non-fatal error: Code 27,
sub-code 0x8 (count)
Non-fatal error: Code 27,
sub-code 0x9 (index)
Description
TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Resume"
This sub-code indicates that the critical temperature
sensors have
been below their thresholds for at least 30 minutes and the
BIOS is
resuming the boot process. See sub-code 5 for more
temperature
shutdown information.
TEMP_VOLTAGE_FAILURE"Temp Shutdown Override"
This sub-code indicates that BIOS skipping a critical
temperature
boot pause due to a node power cycle. See sub-code 5 for
more
temperature shutdown information.
TEMP_VOLTAGE_FAILURE"No Response"
This sub-code indicates that the I2C sensor defined failed
to
respond on the I2C bus. 'Count' indicates the number of I2C
device failures.
TEMP_VOLTAGE_FAILURE"High Limit Error"
This sub-code indicates that BIOS detected a mathmetical
overflow
of the 8-bit upper limit register and measurements on this
sensor
indicated by 'index' may be incorrect.The voltage or
temperature
limit could not be converted to and stored as an 8-bit
value.
Contact Engineering for a HW fix.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
255
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x0 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Cluster Memory ASIC not found for CM Init.
The Eagle/Osprey/Harrier ASICs are the Cluster Managers
which are
used for high speed communication between nodes of a
cluster.
These device are critical for the correct operation of the
node
software, and hence for operation of the whole cluster. The
CM
exists on all PCI buses in the node. If the CM cannot be
found on
any of the require PCI bus, this is a serious problem.subcode
0x0 indicates the PCI bus scan did not locate the Cluster
Manager.
Resolution: A) Cycle power on the node.
B) Pull all PCI cards and cycle power on the node.
C) Replace the node motherboard.
Diagnostic: A) Use "pci find 1590" at the Whack prompt to
see if the CM can be located. Since the same
data structure is used, it should not show
up there either. Use "pci init" which will
scan the PCI bus again.If the CM appears
now (with "pci find 1590"), it may be a
transient problem.
B) Examine the output of "pci probe" to determine
if other onboard PCI devices are missing. This
may help to determine where the failure occurs.
For example, if the four PCI bridges do not
show, it may be the CIOB at fault.
Fatal error: Code 28, subcode 0x0 (1)
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMMs did not compare identical DIMM and SPD comparison
Failed
Not all required CMA DIMMS were found or are exact matches
(cma_dimm_unmatched is set)
The Harrier2 ASICs are the Cluster Managers which are
used for high speed communication between nodes of a
cluster.
These device are critical for the correct operation of the
node
software, and hence for operation of the whole cluster. subcode
0x1 indicates that one or more DIMMs attached to the
Harrier2 did
not match DIMM0.0.0.
Resolution: A) Install the same DIMM type in all CM DIMM
slots.
B) If your intention was to test with different DIMMs,
"set perm cma_dimm_unmatched" and "reset" node.
Table Continued
256
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x0 (2)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Harrier2 #0 not found for CM Init.
The Harrier2 ASICs are the Cluster Managers which are
used for high speed communication between nodes of a
cluster.
These device are critical for the correct operation of the
node
software, and hence for operation of the whole cluster. The
CM
exists on all PCI buses in the node. If the CM cannot be
found on
any of the require PCI bus, this is a serious problem.subcode
0x2 indicates the PCI bus scan did not locate the Cluster
Manager.
Resolution: A) Cycle power on the node.
B) Pull all PCI cards and cycle power on the node.
C) Replace the node motherboard.
Diagnostic: A) Use "pci find 1590" at the Whack prompt to
see if the CM can be located. Since the same
data structure is used, it should not show
up there either. Use "pci init" which will
scan the PCI bus again.If the CM appears
now (with "pci find 1590"), it may be a
transient problem.
B) Examine the output of "pci probe" to determine
if other onboard PCI devices are missing. This
may help to determine where the failure occurs.
For example, if the four PCI bridges do not
show, it may be the CIOB at fault.
Fatal error: Code 28, subcode 0x0 (3)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Harrier2 #1 not found for CM Init.
See Code 28, sub-code 0x0 (2) for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
257
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 28,
sub-code 0x0 (xx04)
Description
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMM 0:
Unsupported Raw Card Type in SPD byte 62 = xx,
Using rdimm_control_words[0][].
Where xx, is the hex value that was read from DIMM0 SPD
Byte 62.
Byte 62 of the DIMM SPD indicates which JEDEC reference
design raw
card was used as the basis for the module assembly, if any.
Bits
4 ~ 0 describe the raw card and bits 6 ~ 5 describe the
revision
level of that raw card. Special reference raw card
indicator, 1F,
is used when no JEDEC standard raw card reference design
was used
as the basis for the module design. Preproduction modules
should
be encoded as revision 0 in bits 6 ~ 5.
The reference card is looked up in rdimm_control_words to
determine
the index into the rdimm_control_words table. If the value
in
Byte 62 is not found in the table this error reported.
Resolution: A) Replace DIMM with a supported Raw Card Type.
Non-fatal error: Code 28,
sub-code 0x0 (xx05)
CM_MEMORY_FAILURE "Cluster Memory Failure"
DIMM 1:
Unsupported Raw Card Type in SPD byte 62 = xx,
Using rdimm_control_words[0][].
Where xx, is the hex value that was read from DIMM1 SPD
Byte 62.
Resolution: A) Replace DIMM with a supported Raw Card Type.
Table Continued
258
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 28,
sub-code 0x1 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Pairwwww DIMMxxxx: Bad checksum. Got yyyy, SPD said zzzz
The memory DIMMs located on the CM riser are called cluster
memory. This memory is used to store data destined for the
disks (dirty data) as well as data previously read from the
disks (cache data). It is also used for communication among
the nodes in the cluster. This memory is not required to
boot
the operating system, but is required for the node to
participate in the cluster. Even before the memory is
thoroughly tested for proper operation, it must be
configured
to appear in CM addressable space.Each memory DIMM has a
small embedded serial EEPROM which holds DIMM configuration
information such as the number of rows, columns, and banks,
as
well as memory timing.If this serial EEPROM becomes corrupt,
data stored in it regarding the DIMM configuration cannot be
trusted. So, this EEPROM also contains a checksum which the
BIOS verifies is correct before configuring the DIMM. If
this
checksum does not match the checksum the BIOS computes
across
the DIMM, this error will result. You should look at prior
output to determine if there were I2C errors. These errors
suggest a problem with riser installation.
The DIMM number is logged in the Data field of the Fatal
Error.
Resolution: A) Reseat Cluster Memory riser card(s).
B) Reseat Cluster Memory DIMMs.
C) Replace Cluster Memory DIMMs in pairs to ensure
replacement parts are matched.
P4-Eagle and PIII-Eagle DIMM Pairs are always
located four riser positions apart.
For example, if you number the slots from the top,
Pair 0 is at positions 3 and position 7 (top).
Pair 1 is at positions 0 (bottom) and position 4.
Pair 2 is at positions 2 and position 6.
Pair 3 is at positions 1 and position 5.
Ironman (Tclass) and Tinman (Fclass) sets are
always in sets of three. The DIMMs are set as
"DIMM C.S" as in Channel then set. There are two
riser cards, one for channel 0 and one for
channel 1 and 2.
Set 0 is DIMM 0.0, 1.0, 2.0
Set 1 is DIMM 0.1, 1.1, 2.1
Set 2 is DIMM 0.2, 1.2, 2.2
Titan and Atlas have 4 DIMM sets on the motherboard.
Set 0: DIMM 0.0 and 1.0
Set 1: DIMM 0.1 and 1.1
Set 2: DIMM 2.0 and 3.0
Set 3: DIMM 2.1 and 3.1
D) Replace the Cluster memory riser(s).
E) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
259
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Diagnostic: A) The Cluster Memory DIMMs appear on the I2C
bus
at 2.a0 through 2.ae. Use the Whack "d i2c"
command to display the DIMM serial EEPROM
contents to determine if there is a pattern.
Example (DIMM 5):
Whack> d i2c 2.aa.0
Fatal error: Code 28, subcode 0x2 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Pairww DIMMxx (yyyy): 'zzzz' read failed
Where xxxx is one of:
row address, column address, module rows, cas latency3,
refresh, banks, cas latency2, cas latency1, ras precharge,
act_to_rw, act_to_deact, ras cycle, write_to_deact,
density, frequency, DIMM type
This error indicates that a Cluster Memory DIMM was
detected but
that the Serial EEPROM present on the DIMM could not be
reliably
read.
The DIMM number is logged in the Data field of the Fatal
Error.
See Code 28, sub-code 0x1 for resolution information.
Non-fatal error: Code 28,
sub-code 0x3 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Pairqq DIMMtt (uuuu): vv != DIMMww (xxxx): yy zzzz
This error indicates the BIOS detected the SDRAM DIMMs in
the
cluster memory bank pair are of a different type.
One DIMM number of the mismatched pair will be logged in the
data field of the Fatal Error.
Resolution: A) Ensure both DIMMs in the pair are identical.
Note that two DIMMs may have the same capacity
but have different number of rows, columns, or
banks. The DIMM configuration must exactly
match. If the DIMMs have similar markings and
capacity, they are probably identical.
Diagnostic: A) The Serial EEPROM information in each pair of
DIMMs should be identical or nearly identical.
See Code 28, sub-code 0x1 for more resolution
information.
Table Continued
260
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0x4 (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Pairww DIMM xx (yyyy): di_module_rows is not 1
or 2! zzzz
This error indicates the Cluster Memory DIMMs reported an
odd (and
unsupported) number of rows. Usually the number of rows
reported
by a DIMM corresponds to the number of sides of the DIMM
which
are populated by memory.
One DIMM number of the failing pair will be logged in the
Data field of the Fatal Error.
See Code 28, sub-code 0x3 for resolution information.
Fatal error: Code 28, subcode 0x5 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
No Cluster Memory Installed
This error indicates that no memory was found in the Cluster
memory riser. Since cluster memory is needed for proper
node
operation within the cluster, this is a condition which
must be resolved for proper operation.You should look at
prior
output to determine if there were I2C errors. These errors
suggest a problem with riser or DIMM installation.
See Code 28, sub-code 0x1 for resolution information.
Fatal error: Code 28, subcode 0x6 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Pairww DIMM xx (yyy): RAS cycle time > 10.
got zzz/10
We
This error indicates the Serial EEPROM on the DIMM reports
a value which is outside tolerance for the memory
controller.
One DIMM number of the failing pair will be logged in the
Data field of the Fatal Error.
See Code 28, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
261
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0x7 (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Cluster Memory not responding.
DIMM uuu (vvv): Expected = (xxxx) Actual = (yyyy) Addr
(zzzz)
*** Error: Cluster Memory FAILURE - too many mismatches.
Before ECC initialization of Cluster memory (scrub), a small
region must be tested and configured by the CPU to set up
the
ECC scrub of the remainder. If an error occurs during this
test
(such as memory read does not match the value just
written), then
this error will be reported. The DIMM number is logged in
the
Data field of the Fatal Error.
Diagnostic: A) Compare the expected
pattern such as a bit stuck high
Example (bit 31 stuck low):
Expected = (0xf1f1f1e5) Actual =
Expected = (0x92929285) Actual =
Expected = (0xb3b3b3a5) Actual =
Expected = (0xd3d3d3c5) Actual =
and actual values for a
or stuck low.
(0x71f1f1e5)
(0x12929285)
(0x33b3b3a5)
(0x53d3d3c5)
See Code 28, sub-code 0x1 for resolution information.
Fatal error: Code 28, subcode 0x8 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Found errors during scrub. Eagle Error Status:
xxxx
*** Error: Found errors during scrub. Osprey Error Status:
xxxx
During the ECC initialization of Cluster memory,
The Cluster Manager records and memory errors it encounters.
If any were recorded, this error will be displayed.
See Code 28, sub-code 0x1 for resolution information.
Table Continued
262
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x9 (0)
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: CM DIMM programmed address > top of memory
For each Cluster memory DIMM, there is a register in the
Eagle /
Osprey memory controller which specifies where the DIMM
maps into
CM physical memory. These mapping registers are configured
during the Cluster memory probe and should not change under
normal circumstances. Since this is an internal CM
register,
it is unlikely that reseating memory will correct this
problem.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the CM
register set which is mapped into CPU memory for
access.Use the Whack "pci find 1590" command to
find the CM on the PCI bus.The base address in PCI
space for the configuration and status registers
(CSRs) is Window 0.Example:
Whack> pci find 1590
Win Baseaddr Basesize Identity
[0] 00:90200000 00:000004003PAR (ASIC) LPC#
[1] 00:20000000 00:20000000
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x90200000 above).
This is the base address of the CM Memory Control
Register Block.Refer to the Scaffold System
Architecture Reference for information as register
programming.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
263
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0xa (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz)
*** Error: Uncorrectable ECC
The Cluster memory controller detected an Uncorrectable ECC
error.
Eagle / Osprey identifies the failing bank and address with
the
error as well as the error syndrome. The BIOS will convert
the
information into the failing DIMM and Riser Slot numbers.
There
may be multiple Uncorrectable errors. In this case, the CM
will
save the address/syndrome for the most recent error.
The DIMM number is logged in the Data field of the Fatal
Error.
Eagle nodes (S-Series and E-Series):
There are 8 DIMMs maximum on the S-Series Cluster Memory
Riser
Card. If the DIMM number is not between 0-7 (inclusive),
then
the failing DIMM cannot be identified.
Osprey nodes (T-Series and F-Series):
There are 6 DIMMs on T-Series and 3 DIMMs on F-Series.
The data field encodes which DIMM encodes the DIMM number
in the lower 4 bits of the field and the channel number in
the upper 4 bits. So a data value of 12 indicates DIMM 1.2
is at fault.
Harrier nodes (V-Series, Atlas, Minime1 & 2):
There are 8 DIMMs on V-Series between two different Harrier
ASICs; two memory controllers with 2 DIMMs each.
The data field encodes which memroy channel encountered the
uncorrectable error. A data value of 10 means channel one
ia at fault, a value of 0 means channel zero is at fault.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM(s).
D) Replace the failing Cluster Memory DIMM(s).
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the
CM register set which is mapped into CPU memory
for access.Use the Whack "pci find 1590" command
to find the CM on the PCI bus. The base address in
PCI space for the configuration and status
registers (CSRs) is Window 0. Example:
Whack> pci find 1590
... Win Baseaddr Basesize Identity
... [0] 00:60200000 00:000004003PAR Eagle
... [1] 00:20000000 00:20000000
... [2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x60200000 above).
This is the base address of the CM Memory Control
Table Continued
264
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Register Block.Refer to the Scaffold System
Architecture Reference for information as register
programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The CM Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
265
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0xb (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz)
*** Error: Correctable ECC
The Cluster memory controller detected a correctable ECC
error.
The CM identifies the failing bank and address with the
error as
well as the error syndrome. The BIOS will convert the
information
into the failing DIMM and Riser Slot numbers.
The DIMM number is logged in the Data field of the Fatal
Error.
Eagle nodes (S-Series and E-Series):
There are 8 DIMMs maximum on the Cluster Memory Riser Card.
If the DIMM number is not between 0-7 (inclusive), then the
failing DIMM cannot be identified.
Osprey nodes (T-Series and F-Series):
There are 6 DIMMs on T-Series and 3 DIMMs on F-Series.
The data field encodes which DIMM encodes the DIMM number
in the lower 4 bits of the field and the channel number in
the upper 4 bits. So a data value of 12 indicates DIMM 2.1
is at fault.
Harrier nodes (V-Series, Atlas, Minime1 & 2):
This should not occur on Harrier.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM.
D) Replace the failing Cluster Memory DIMM.
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the
CM register set which is mapped into CPU memory
for access.Use the Whack "pci find 1590" command
to find the CM on the PCI bus. The base address in
PCI space for the configuration and status registers
(CSRs) is Window 0.Example:
Whack> pci find 1590
Win Baseaddr Basesize Identity
[0] 00:60200000 00:000004003PAR Eagle
[1] 00:20000000 00:20000000
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x60200000 above).
This is the base address of the CM Memory Control
Register Block.Refer to the Scaffold System
Architecture Reference for information on register
programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The CM Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
Table Continued
266
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
267
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0xc (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
*** Error: Addr (zzzzzzzz) Wrote (wwwwwwww) Read (yyyyyyyy)
or
*** Error: Data Miscompare in Final Block offset zzzzzzzz
*** Error: Expected (wwwwwwww) Actual (yyyyyyyy)
or
*** Error: CM DIMM5 (Jxxxx): Address (uu:uuuuuuuu)
CM DECODE TEST miscompare at (1) (vvvvvvvvvvvvvvvv)
Expected: (wwwwwwww)
Actual:
(yyyyyyyy)
Offset:
(zzzzzzzz)
or
similar to above
The CBIOS runs Cluster Memory Tests as part of POST in both
normal operation and manufacturing test. If any test fails
due to a data miscompare, the test will generate this fatal
error code with sub-code '0xc'. CBIOS runs the following
tests:
Walking 1/0 across data
Walking 1/0 across address (512 MB Small Memory Window)
Walking 1/0 using XCB (64 bytes) across segment boundaries
Any test failure will result in a fatal error.
The DIMM number is logged in the Data field of the Fatal
Error.
Eagle nodes (S-Series and E-Series):
There are 8 DIMMs maximum on the Cluster Memory Riser Card.
If the DIMM number is not between 0-7 (inclusive), then the
failing DIMM cannot be identified.
Osprey nodes (T-Series and F-Series):
There are 6 DIMMs on T-Series and 3 DIMMs on F-Series.
The data field encodes which DIMM encodes the DIMM number
in the lower 4 bits of the field and the channel number in
the upper 4 bits. So a data value of 12 indicates DIMM 2.1
is at fault.
Harrier nodes (V-Series, Atlas, Minime1 & 2):
This should not occur in Harrier.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM.
D) Replace the failing Cluster Memory DIMM.
E) Replace the node motherboard.
Diagnostic: A) The memory controller registers are part of
the
CM register set which is mapped into CPU memory
for access.Use the Whack "pci find 1590" command
to find the CM on the PCI bus. The base address in
PCI space for the configuration and status registers
(CSRs) is Window 0.Example:
Whack> pci find 1590
Win Baseaddr Basesize Identity
[0] 00:60200000 00:000004003PAR Eagle
[1] 00:20000000 00:20000000
Table Continued
268
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
[2] 02:00000000 02:00000000
Add offset 0xc0 to that address (0x60200000 above).
This is the base address of the CM Memory Control
Register Block.Refer to the Scaffold System
Architecture Reference for information on register
programming.
Window 1 is the small cluster memory offset. If
the error address is in the first 512 MB of Cluster
memory, use whack to read/write this location and
confirm the error. The CM Central Error register
must be reset prior to error reproduction.
If the error address is greater than 512 MB, then
XCBs may be used to reproduce the error. Type
"xcb help" to get more information on using XCBs.
Fatal error: Code 28, subcode 0xd (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Pairwwww DIMMxxxx: Illegal SPD value <name of value> <value>
This error indicates that a Cluster Memory DIMM was
detected but
that the Serial EEPROM present on the DIMM reported an
illegal
or unsupported value for our memory controller.
The DIMM number is logged in the Data field of the Fatal
Error.
Example:
Density (SPD byte 31) has more than 1 bit set (ie. 0x30)
which indicates a non-standard part.
See Code 28, sub-code 0x1 for resolution information. Most
likely, the DIMM is not qualified for use in our Node Board.
Fatal error: Code 28, subcode 0xe (mm)
CM_MEMORY_FAILURE "Cluster Memory Failure"
If there was a problem mapping the CM Small Cluster memory
window into CPU 32-bit space, this error may result when
attempting to initialize Cluster memory. The initialization
problem could be due either to hardware failure or by
setting
a special NVRAM variable that eliminates the address space
normally reserved for CM memory windows. An example of
such is setting "mem_max" to a value above 2496. Another
example would be setting "pci_base" above 0xa0000000.
Resolution: Contact 3PAR technical support.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
269
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0xf (mm)
Description
CM_MEMORY_FAILURE "Cluster Memory Failure"
*** Error: Bank (xx) CM DIMMyy (Jzzzz)
*** Error: CM DIMMs with ECC errors
The Cluster memory controller detected a memory error in a
specific DIMM bank. The CM memory error status register is
logged in the Data field of the Fatal Error.
See Code 28, sub-code 0xb for resolution information.
Fatal error: Code 28, subcode 0x10 (mm)
CM_MEMORY_FAILURE
H1 LPC0 HW ERR ST
H1 LPC0 ERR Stat
H1 LPC0 ERR ID
"CMA Failure"
[00000004]: dataq_parity
[00000006]: EP-Error-Rpt Fatal-Error
[80000000]: HW-Err
The Cluster memory controller detected a hardware error.
This
error is printed, as shown above. mm is decoded as bits
31-28
represent the LPC number and bits 27-0 are the error bits as
set in the hardware error status register.The hardware error
means that the Harrier ASIC is non functional.
Resolution: A) Cycle power on the node.
B) Replace the node.
Fatal error: Code 28, subcode 0x20 (mm)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Testing CM data lines with walking 1
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 bits test verifies that the processor may
directly access CM cluster memory by performing a walking
1's
test on all data lines. If any fails, this error will
result.
The data value (mm) could be in the form 0x00XXYYZZ there XX
is the DIMM number (0-11), YY is the return code (RC_??),
and
the ZZ valeu is the number of errors found.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat Cluster Memory DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack command line to attempt to
access CM
memory manually to determine if data line bits are
stuck.
Table Continued
270
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0x21 (mm)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM data lines with walking 0
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 bits test verifies that the processor may
directly access cluster memory by performing a walking 0's
test on all data lines. If any fails, this error will
result.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x22 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
ZERO CM problem at addr xxxx
Between PCI bus tests, a small portion of cluster memory
is cleared. If errors in clearing the memory are detected,
this error will result.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x23 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM address lines with walking 1 (first 512 MB only)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 address bits test verifies that the
processor
may directly access cluster memory by performing a walking
1's
test on all address lines.If any fails, this error will
result.
See Code 28, sub-code 0x20 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
271
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x24 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM address lines with walking 0 (first 512 MB only)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 address bits test verifies that the
processor may
directly access cluster memory by performing a walking 0's
test
on all address lines. If any fails, this error will result.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x25 (mm)
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM segment decode boundaries
This test verifies that memory decoding at all CM DIMM
pairs is
working correctly.It does so by writing a unique 128 bytes
at
each memory decode boundary location. It then verifies the
values were written correctly and looks for corruption of
other
addresses.
See Code 28, sub-code 0x20 for resolution information.
Table Continued
272
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0x26 (eecd)
Description
CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure"
Testing CM with random XOR (all Cluster Memory)
ee = number of errors in XOR errors.
c = Channel Number where the error took place.
d = DIMM number where the error took place.
HW error during
or
HW error during
or
*** Error: Data
Expected (yyyy)
XCB transfer CM -> CM
XCB transfer CM -> PCI 1 (xxxx)
Miscompare in Final Block offset xxxx
Actual (zzzz)
This function performs a random data test on all cluster
memory
attached to the CM to verify memory under stress with random
patterns. This test also exercises the CM XOR engine as
several
sources are used simultaneously throughout the cluster
memory test.
See Code 28, sub-code 0x20 for resolution information.
Fatal error: Code 28, subcode 0x27 (0)
CM_MEMORY_FAILURE (<DIMM>)"DQS Training Failed"
This error occurs when the DQS training fails to find
working
values for the DQS enable, DQS out skew, and DQS in skew.
See Code 28, sub-code 0x20 for resolution information.
*** Fatal error: Code 28, sub-code 0x30 (mm).
CM_MEMORY_FAILURE "Cluster Memory Failure"
Testing CM ECC lines with walking 1
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 bits test verifies that the processor may
directly access CM cluster memory by performing a walking
1's
test on all ECC lines.If any fails, this error will result.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat Cluster Memory DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack command line to attempt to
access CM
memory manually to determine if data line bits are
stuck.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
273
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0x31 (mm)
Description
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM ECC lines with walking 0
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 bits test verifies that the processor may
directly access cluster memory by performing a walking 0's
test on all ECC lines.If any fails, this error will result.
See Code 28, sub-code 0x30 for resolution information.
Fatal error: Code 28, subcode 0x32 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM Op Codes
The CM Op Code test verifies that the processor may execute
one of the available opperations for this cluster manager
ASIC. This error means that a particular opcode is not
supported.
If any op code fails, this error will result.
Resolution: A) Replace the node motherboard.
Fatal error: Code 28, subcode 0x33 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM Source Interrupts
The CM Source
generated for
companion CMA
systems
with only one
Interrupts test will test that an interrupt is
each CMA data path, from processor, CMA, or
to either processor memory to local CMA.On
CMA, the companion tests are not done.
Resolution: A) Replace the node motherboard.
Fatal error: Code 28, subcode 0x34 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM I2C communication test
The CM I2C comminucation test will read and write to various
safe CMA registers or CMA memory and verify that the
expected
values are read. A fail means either a bad DIMM or bad CMA.
See Code 28, sub-code 0x30 for resolution information.
Fatal error: Code 28, subcode 0x35 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Stopped on an Uncorrectable Error
The scan for errors found an uncorrectable error in one
of the CMAs. The system stopped during a BIOS test when
this error was discovered.
See Code 28, sub-code 0x30 for resolution information.
Table Continued
274
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x36 (data)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Stopped on a Correctable Error
The scan for errors found a correctable error in one
of the CMAs. The system stopped during a BIOS test when
this error was discovered.
See Code 28, sub-code 0x30 for resolution information.
Fatal error: Code 28, subcode 0x40 (mm)
CM_MEMORY_FAILURE "Cluster Memory Failure"
Testing CM MMW data lines with walking 1
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 bits test verifies that the processor may
directly access CM cluster memory by performing a walking
1's
test on all data lines. This test uses the Medium Memory
Window (MMW). If any fails, this error will result.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat Cluster Memory DIMMs.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack command line to attempt to
access CM
memory manually to determine if data line bits are
stuck.
Fatal error: Code 28, subcode 0x41 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM MMW data lines with walking 0
Addr (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 bits test verifies that the processor may
directly access cluster memory by performing a walking 0's
test on all data lines. This test uses the Medium Memory
Window (MMW). If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Fatal error: Code 28, subcode 0x42 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
ZERO CM problem at addr xxxx
Between PCI bus MMW tests, a small portion of cluster memory
is cleared. If errors in clearing the memory are detected,
this error will result.
See Code 28, sub-code 0x40 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
275
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x43 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 1 (MMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 address bits test verifies that the
processor
may directly access cluster memory by performing a walking
1's
test on all address lines using the medium memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Fatal error: Code 28, subcode 0x44 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 0 (MMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 address bits test verifies that the
processor may
directly access cluster memory by performing a walking 0's
test
test on all address lines using the medium memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Table Continued
276
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 28, subcode 0x45 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 1 (RMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 1 address bits test verifies that the
processor
may directly access cluster memory by performing a walking
1's
test on all address lines using the remote memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Fatal error: Code 28, subcode 0x46 (mm)
CM_MEMORY_FAILURE"Cluster Memory Failure"
Testing CM address lines with walking 0 (RMW)
*** Error: Failed to write:
Address (xxxx) Data (yyyy)
or
*** Error: Short to Ground - Data same as at Addr 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: Short to Address - Data should be 0
*** Error: Addr (xxxx)
Read(yyyy)
or
*** Error: could not write data to this address
*** Error: Write At (xxxx)
Wrote(yyyy) Read(zzzz)
The CM walking 0 address bits test verifies that the
processor may
directly access cluster memory by performing a walking 0's
test
test on all address lines using the remote memory window.
If any fails, this error will result.
See Code 28, sub-code 0x40 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
277
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 28, subcode 0x47 (wwxxyyzz)
Description
CM_MEMORY_FAILURE"Cluster Memory Failure"
*** Error: MTB Granularity error!
SPD Byte 10, expected: ww, actual: yy
SPD Byte 11, expected: xx, actual: zz
All the MTB (Medium TimeBase) calculations in the software
leveling
code are based on an MTB granularity of 0.125ns (SPD Byte
10=0x01
and Byte 11=0x08). These bytes define a value in
nanoseconds that
represents the fundamental timebase for medium grain timing
calculations. This value is typically the greatest common
divisor
for the range of clock frequencies (clock periods)
supported by a
particular SDRAM. This value is used as a multiplier for
formulating
subsequent timing parameters. The medium timebase (MTB) is
defined
as the medium timebase dividend (byte 10) divided by the
medium
timebase divisor (byte 11).
Resolution: A) Replace CM DIMM.
Fatal error: Code 29, subcode 0x0 (data)
CM_LINK_FAILURE "Cluster Link Failure"
Link 0 did not come up (0xac000000) error = (0x002022ff)
(data = link number)
CM Links are high speed connections between all of the
node boards in a cluster via the center panel.
During Manufacturing test, nodes are connected to a
special Manufacturing Center panel that connects the link
transmitter to its own receivers (external loopback).
When the node senses that it is in this special Center
Panel, it will initialize all of the links and perform
loopback tests. If any link fails to initialize, this
sub-code will be reported.
Resolution: A) Cycle power on the node.
B) Verify that the node is securely mated with
the Center Panel.
C) Turn off power, re-seat the node into the
center panel, and turn power back on.
D) Replace the node motherboard.
Diagnostic: A) Use the Whack "eagle link" commands to run
more
diagnostic tests on the links. The CM requires
both the PCI scan has completed and Cluster Memory
present and initialized.
Table Continued
278
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 29, subcode 0x1 (data)
CM_LINK_FAILURE "Cluster Link Failure"
CM Link Initialization failed
(data = LLRR) where
LL is the link bit pattern. 01 is link 0, 02 is link 1, 04
is
link 2, and 08 is link 3.
RR is the failure reason. E4 is Hardware error, F0 is user
abort.
CM Links are high speed connections between all of the node
boards via the center panel. During Manufacturing test,
nodes
are connected to a special Manufacturing Center panel that
connects each link's transmitter to its own receiver
(external
loopback).When the node senses that it is in this special
Center
Panel, it will initialize the links and run a special test
to
verify the operation of the transmitter/receivers of each
link.
If any link fails, the test will report this sub-code.
See Code 29, sub-code 0x0 for resolution information.
Fatal error: Code 29, subcode 0x2 (data)
CM_LINK_FAILURE "Cluster Link Failure"
CM# LinkXOR test: Link [0]..[FAIL] (1)
(data = the link bit pattern. bit 0 is link 0, bit 1 is
link 1, bit
2 is link 2, and bit 3 is link 3.
CM Links are high speed connections between all of the node
boards via the center panel. During Manufacturing test,
nodes
are connected to a special Manufacturing Center panel that
connects each link's transmitter to its own receiver
(external
loopback).When the node senses that it is in this special
Center
Panel, it will initialize the links and run a special test
to
verify the operation of the transmitter/receivers of each
link.
If any link fails, the test will report this sub-code.
See Code 29, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
279
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 29, subcode 0x3 (data)
CM_LINK_FAILURE "Cluster Link Failure"
CM# Link INT??? test: Link [0]..[FAIL] (1)
(data = the link bit pattern. bit 0 is link 0, bit 1 is
link 1, bit
2 is link 2, and bit 3 is link 3.
The CM Link INT test verifies that setting either of the two
interrupt flags (DEST, SRC) in the XCB does actually
generate
and interrupt to the processor.
See Code 29, sub-code 0x0 for resolution information.
Fatal error: Code 29, subcode 0x4 (data)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error RTT Link 1 XCB ASync failed (Send)
(data = link number)
The CM Link Round Trip Test failed due to an XCB failure.
CM XCB failed during link DMA.Use the "eagle status" command
for more information on the type of error.This test checks
the
CM link status at multiple times during the test.
The "(Send)" part of the message indicates which stage
failed. Another possible values is "(Receive)".
See Code 29, sub-code 0x0 for resolution information.
Fatal error: Code 29, subcode 0x5 (data)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error RTT (Receive) Link 1 Length = 0
or
*** Error RTT Offset = xxxxx Expected = yyyyy
Returned = zzzzz
or
*** Error RTT (Return) Link 1 Length mismatch
or
*** Error RTT (Return) Link 1 Timestamp mismatch.
(data = link number)
The CM Link Round Trip Test failed due to data miscompare.
All packets have a length check and timestamp check.
Payload
compare is optional. Use the "eagle status" command to
check
for Uncorrectable ECC errors.
See Code 29, sub-code 0x0 for resolution information.
Table Continued
280
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 29, subcode 0x6 (data)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error RTT (Return) Timeout waiting for packet Link 1
(data = link number)
The CM Link Round
A packet was sent
period. The Round
a remote node.Use
Uncorrectable ECC
Trip Test failed due to packet timeout.
and not received in a reasonable timeout
Trip Test may not have been started on
the "eagle status" to check for
errors.
Resolution: A) Start CM Link Round Trip Test on remote node.
B) Cycle power on the node.
C) Verify that the node is securely mated with
the Center Panel.
D) Turn off power, re-seat the node into the
center panel, and turn power back on.
E) Replace the node motherboard.
Diagnostic: A) Use the Whack "eagle link" commands to run
more
diagnostic tests on the links. The CM requires
both the PCI scan has completed and Cluster Memory
present and initialized.
Fatal error: Code 29, subcode 0x10 (0)
CM_LINK_FAILURE "Cluster Link Failure"
REC_EN went low. Test failed for link [x](yyyyyyyy)
The "cma link init" command is used to initialize and bring
up
the CM links to nodes which indicate a "Power Ok" state.
If this
error occurs, it is possible the remote node was
transmitting
BIST, but then later stopped (such as from a reset or power
off).
Resolution: A) Perform the same test again.
B) Replace the node motherboard.
Diagnostic: A) Verify CM link may be brought up manually
using
the "eagle link set" command.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
281
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 29, subcode 0x11 (0)
CM_LINK_FAILURE "Cluster Link Failure"
*** Error CM linkxx producer / consumer mismatch
The CM has XCB engines which transfer data. Software
manages the
producer register and the CM hardware follows with the
consumer
register. If these two do not agree and CM should be idle,
then
it's possible the CM has halted due to failure of some
operation.
This problem is likely caused by a cluster memory or link
failure.
Resolution: A)
B) Replace the
C) Replace the
Diagnostic: A)
Fatal error: Code 30, subcode 0x0 (0)
Cycle power on the node.
node motherboard.
link partner node.
Replace Eagle/Osprey/Harrier ASIC.
SERIAL_PORT_FAILURE "Serial Port Failure"
*** Error: No Oxford serial chip xx found
or
*** Error: No Exar serial chip found
The Exar and Oxford serial chips are used for a secondary
low speed link which directly connects all nodes in the
cluster. They are primarily in the event of a link failure
to verify whether another node in the cluster has actually
gone down.Since the part is integrated onto the motherboard
and is on a PCI bus, a failure to locate the internal serial
chips may indicate other PCI problems as well.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "pci probe" command to show all
devices on the PCI bus.Look for the two
Oxford device entries, or a single Exar device
entry (Pentium 4 node).If they are not there,
verify other board level components are present
in the list in order to isolate the component
failure on the board.
B) Note that a failure of a single Oxford chip
may be the cause of this behavior as one
bridges to the PCI bus for both.
Table Continued
282
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 30, subcode 0x1 (0)
Description
SERIAL_PORT_FAILURE "Serial Port Failure"
*** Error: Serial Port Mfg Test failed
Port (3) [FAIL]
When the Node board is inserted into a Manufacturing
Test Centerpanel, the internal Serial Port Manufacturing
test
will automatically run. This error indicates failures on
all ports tested.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "pci probe" command to show all
devices on the PCI bus.Look for the two
Oxford device entries or a single Exar device
entry (Pentium 4 node).If they are not there,
verify other board level components are present
in the list in order to isolate the component
failure on the board.
B) Note that a failure of a single Oxford chip
may be the cause of this behavior as one
bridges to the PCI bus for both.
C) Whack provides internal serial Serial Port
commands for further analysis.
Fatal error: Code 30, subcode 0x2 (0)
SERIAL_PORT_FAILURE "Serial Port Failure"
Port (4):Processed 109 bytes[FAIL]
All cluster internal serial ports go through a quick
internal
loopback test immediately after initialization to do a short
test of proper operation. This test will run regardless of
the type of centerplane in which the node is connected. This
error indicates failures on all ports tested.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Use the Whack "pci probe" command to show all
devices on the PCI bus.Look for the two
Oxford device entries or a single Exar device
entry (Pentium 4 node).If they are not there,
verify other board level components are present
in the list in order to isolate the component
failure on the board.
B) Note that a failure of a single Oxford chip
may be the cause of this behavior as one
bridges to the PCI bus for both.
C) Whack provides internal serial Serial Port
commands for further analysis.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
283
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 30, subcode 0x3 (0)
SERIAL_PORT_FAILURE "Serial Port Failure"
Internal UART is not functioning properly.
Most likely this is due to a hardware failure related to
the SuperIO.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Non-fatal error: Code 31,
sub-code 0x0 (0)
GPIO_TEST_FAILURE "GPIO Failure"
FAIL (high)
Port (6) Bit (4) wrote 0(0x1)
Port (7) Bit (4) read 1, expected 0(0x3)
The Vitesse VSC055 2 Wire Backplane Controller chip controls
interfaces to the Centerplane, LEDs, Power Supplies, Nickel
battery, and PCI slots. It is connected to the I2C bus.
In normal 2, 4, or 8 node centerplanes, the chip will get
its ports initialized as inputs or outputs and start
monitoring
peripheral systems. No tests available.
When connected to a Manufacturing Centerplane, it will have
selected pins routed to other pins for loopback testing.
See the Manufacturing Centerplane Specification for details.
During this test, proper VSC operation will be confirmed.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Whack "i2c vsc" commands can be used to peek
and
poke the VSC055 chip when in a Manufacturing
Centerplane. In normal Centerplanes, these
pins will be connected to other components and
should not be modified.
Table Continued
284
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 31,
sub-code 0x1 (0)
GPIO_TEST_FAILURE "GPIO Failure"
Failed I2C VSC055 1.ce.yy write zzzz
During initialization, the VSC055 registers are programmed
for
proper system operation. This is done over the I2C bus. If
an I2C operation fails during VSC055 initialization, this
error
will result.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Whack "i2c vsc" commands can be used to peek
and
poke the VSC055 chip. One failure seen in the
past with the VSC055 is that sometimes a specific
chip could not handle the first write access to
the command register which causes a soft reset.
It was determined the part violated the I2C
protocol in ACKing the transaction before the
I2C write operation completed.
Fatal error: Code 31, subcode 0x2 (0)
GPIO_TEST_FAILURE "GPIO Failure"
FPGA Scratchpad registers failed meaning bad FPGA hardware.
Resolution: A) Cycle power on the node.
B) Replace the node.
Non-fatal error: Code 31,
sub-code 0x3 (0)
GPIO_TEST_FAILURE "GPIO Failure"
FPGA Interrupt Test failed.
Resolution: A) Cycle power on the node.
B) Replace the node.
Non-fatal error: Code 31,
sub-code 0x4 (0)
GPIO_TEST_FAILURE "GPIO Failure"
NEMOE Loopback Test failed.
Resolution: A) Cycle power on the node.
B) Replace the node.
Non-fatal error: Code 31,
sub-code 0x5 (0)
GPIO_TEST_FAILURE "GPIO Failure"
During the "Board GPIO Test", the FPGA ID is not what it
expects
it to be.
Resolution: A) Cycle power on the node.
B) Replace the node.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
285
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 31,
sub-code 0x6 (0)
Description
GPIO_TEST_FAILURE "GPIO Failure"
During the "Board GPIO Test", the FPGA Revision is not what
it
expects it to be.
Resolution: A) Cycle power on the node.
B) Replace the node.
Table Continued
286
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 31,
sub-code 0x7 (0)
Description
GPIO_TEST_FAILURE "GPIO Failure"
Titan specific. During the "Manufacturing Centerpanel GPIO
Test",
one or more tests have failed depending upon the output.
A)If failed during
'Testing Expanders (o/p) <--> FPGA (i/p) connections:'
For example,
FAIL (low)
Port (76) Bit (1) wrote(0x00)
Port (302) Bit (4) read 0xff, expected(0xef)
1) Program I2C expander by following command:
Whack> cb i2c 9.76.3 0
Here "76" is reported port number.
"3" is config register offset for the expander.
"0" makes all expander bits as output.
2) Set the bit in I2C expander.
Whack> cb i2c 9.76.1 2
Here "1" is rdwr register offset for the expander.
"2" is reported bit 1 (1 << "1") in expander.
3) Read a byte from FPGA offset.
Whack> db fpga 302 1
Here "302" is reported FPGA offset.
Confirm if the bit "4" in read value is set.
Repeat step 2) and 3) by writing 0 to I2C expander
9.76.1 and checking if the bit "4" in FPGA offset
0x302 is cleared.
B)If failed during
'Testing FPGA (o/p) <--> Expanders (i/p) connections:'
For example,
FAIL (low)
Port (305) Bit (4) wrote(0x00)
Port (7e) Bit (7) read 0x86, expected(0x06)
1) Program I2C expander by following command:
Whack> cb i2c 9.7e.3 ff
Here "7e" is reported port number.
"3" is config register offset for the expander.
"ff" makes all expander bits as input.
2) Write a byte to FPGA offset.
Whack> db fpga 305 10
Here "305" is reported FPGA offset.
Writing 0x10 will set the bit "4" in that offset.
3) Read a byte from I2C expander.
Whack> db i2c 9.7e.0 1
Here "7e" is reported port number.
"0" is read register offset for the expander.
Confirm if the bit "7" in read value is set.
Repeat step 2) and 3) by writing 0 to the FPGA offset
and checking if the bit "7" in I2C Expander 9.7e.0 is
cleared.
C)For all other failure cases refer to
Section # 18.2
"Manufacturing Centerplane GPIO Test Diagnostics" of
Table Continued
Error codes—HPE 3PAR OS 3.2.2
287
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
CBIOS user guide at
http://engweb/twiki/bin/view/Main/TitanMfgCpFpgaGpioTestDiag
Fatal error: Code 32, subcode 0x1 (chunk)
CM_XOR_FAILURE "CM XOR Failure"
Xor Engine Status: P0_XERR
Error Status : XOR_ERR
PCI0 Error Status:
PCI1 Error Status:
The Eagle, Osprey, and Harrier ASICs contain a DMA engine
capable
of XOR operations.This DMA engine is commonly referred to
as the XCB engine.The XCB engine can DMA data between 14
different modules within the ASIC, each module capable of
sinking or sourcing data. The XCB engine will stop all DMA
if it encounters an error while transferring data.The XCB
error status indicates the module that produced the error.
Further details of the error can be gathered by inspecting
the error registers of that module. Use the whack
command "cma status all" to get further diagnostic
information.
If the user continues past this error, software will attempt
to reset the error and continue.
Sub-code 0x1 is specific to Osprey and indicates an
uncorrectable
ECC error following an attempt to zero all of cluster
memory.
The "chunk" value indicates the chunk where the ECC error
occurred.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Whack "cma status all" command displays the
status
registers for each CM module. Refer to the module
that produced the error for further information and
diagnostic procedure.
Fatal error: Code 32, subcode 0x2 (chunk)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Osprey and indicates an
uncorrectable
ECC error following an attempt to ECC scrub all of cluster
memory.
The "chunk" value indicates the chunk where the scrub error
occurred.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Table Continued
288
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 32, subcode 0x6 (chunk)
Description
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates an
uncorrectable
ECC error following an attempt to zero all of cluster
memory.
The "chunk" value indicates the chunk where the ECC error
occurred.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Fatal error: Code 32, subcode 0x7 (err_last)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates a
general Harrier
DMA error following an attempt to zero all of cluster
memory.
The "err_last" value represents the normalized content of
the Harrier
mem_common->mem_err_status register.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Fatal error: Code 32, subcode 0x8 (chunk)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates an
uncorrectable
ECC error following an attempt to ECC scrub all of cluster
memory.
The "chunk" value indicates the chunk where the scrub error
occurred.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Fatal error: Code 32, subcode 0x9 (err_last)
CM_XOR_FAILURE "CM XOR Failure"
This sub-code is specific to Harrier and indicates a
general Harrier
DMA error following an attempt to ECC scrub all of cluster
memory.
The "err_last" value represents the normalized content of
the Harrier
mem_common->mem_err_status register.
See Code 32, sub-code 0x1 for Resolution and Diagnostic
information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
289
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 33, subcode 0x0 (0)
SDRAM_I2C_BAD_READ "Memory I2C Bad Read"
*** Error: Unable to read from SDRAM at I2C ww.xx.yy.zz
This error indicates that an SDRAM DIMM for which
information
was requested is no longer available. This may be due to an
intermittent I2C bus, or a hardware failure.
Resolution: A) Cycle power on the node.
B) Replace the failing DIMM's pair.
C) Replace the node motherboard.
Fatal error: Code 34, subcode 0x1 (0xff)
PCI_BUS_ERROR "PCI Bus Failure"
This error indicates an uncorrectable error occurred on the
PCI bus. In the future, the data field may indicate the PCI
slot number for the device which failed. In order to
determine
the cause of this error, it may be useful to review either
console messages or the IDE disk log. Typical messages
preceding this error are likely difficult to read, but may
indicate the exact cause. Example:
--- SMI: smm_inb(0x3a) == 0x86
GPE 9 triggered
Error in PCI device 02.02.00 (PCI/PCI Bridge #0 (controls
slot 1)):
PCI status register (0x06) [62b0]: Signaled system error
(SERR#),
Received master abort
Secondary PCI status register (0x1e) [0aa0]: Signaled
target abort
Bridge P_SERR (0x6a) [80]: Delayed transaction master
initiator timeout
Error in PCI device 03.01.00 (PCI Slot 1):
PCI status register (0x06) [1290]: Received target abort
Secondary PCI status register (0x1e) [0a80]: Signaled
target abort
Error in PCI device 04.06.00 (inside PCI Slot 1):
PCI status register (0x06) [1230]: Received target abort
Error in PCI device 04.06.01 (inside PCI Slot 1):
PCI status register (0x06) [1230]: Received target abort
(PCI errors not cleared)
Table Continued
290
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 34, subcode 0x1 (ff)
Description
In the above case, a card in PCI Slot 1 was transferring
data up
to a device, likely the cluster manager, when it didn't get
a
response. The bridge above the card received a master
abort,
which it then relayed to its secondary side as signaled
target
abort.The bridge on the card in PCI Slot 1 then received the
target abort and signaled a target abort on its secondary
side.
Both PCI devices then indicated they received target aborts.
Resolution: A) Cycle power on the node.
B) Reseat all PCI cards.
C) Replace the suspected PCI card.
D) Remove PCI cards one at a time.
E) Replace the node motherboard.
Fatal error: Code 35, subcode 0x0 (data)
SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable"
One or both DIMMs in a DIMM pair has failed. Bits 4-7
of the data value indicate the DIMM pair.
If data is0, then DIMM pair 0 has failed.
if data is 10, then DIMM pair 1 has failed.
Example:
--- SMI: TEMPCAUT (SMALERT): 0x01 (bits reset)
Uncorrectable ECC error 0x9279a103 recorded in reg 0x98
Pair1, either DIMM1 or DIMM3 contains the error
Error in locations [0x382cd818 .. 0x382cd81f]
Uncorrectable ECC error 0x9279a101 recorded in reg 0x94
Syndrome/bit number information might not be accurate,
as more than 1 error happened
Pair1, either DIMM1 or DIMM3 contains the error
Error in locations [0x382cd808 .. 0x382cd80f]
(Clearing cache line at 0x382cd800)
(Clearing cache line at 0x382cd800)
ESR == 0x0003 (expected low bit == 0)
Fatal error: Code 35, subcode 0x0 (10)
Resolution: A) Cycle power on the node.
B) Clear dust and debris from the node.
C) Remove and reseat the specified CPU DIMM pair.
D) Replace the failed CPU DIMM pair.
E) Replace the node motherboard.
Diagnostic: A) Verify North Bridge heatsink attachment.
B) Check DIMM clock buffers (X6200 on P4-Eagle).
C) Check DIMM termination (R5836, etc on P4-Eagle nodes).
Table Continued
Error codes—HPE 3PAR OS 3.2.2
291
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 35, subcode 0x1 (data)
Description
SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable"
A single DIMM of a DIMM pair has failed. THe data value
indicates which DIMM. Bits 4-7 of the data value indicate
which DIMM pair. Bits 0-3 of the data value indicate which
DIMM within that pair.
If data is0, then DIMM 0 of pair 0 has failed.
If data is1, then DIMM 1 of pair 0 has failed.
if data is 10, then DIMM 0 of pair 1 has failed.
if data is 11, then DIMM 1 of pair 1 has failed.
Resolution: A) Cycle power on the node.
B) Clear dust and debris from the node.
C) Remove and reseat the specified CPU DIMM.
D) Replace the failed CPU DIMM.
E) Replace the node motherboard.
Fatal error: Code 35, subcode 0x2 (data)
SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable"
This code means an ECC error was detected, but the BIOS did
not completely decode the error.
See Code 35, sub-code 0x0 for resolution information.
Fatal error: Code 36, subcode 0x0 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: SMI: SERR# input went low
In the event of a hardware failure, it is normal to trigger
a processor System Management Interrupt (SMI).If the SMI
gets cleared before the BIOS has a chance to observe it
(which should not happen), then this error will result.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Fatal error: Code 36, subcode 0x1 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: SMI: Write made to ACPI PM register
In normal operation the operating system should not write
to the ACPI PM register. If the BIOS detects a write took
place, it will flag this as an error caused by a failing
operating system or other node hardware.
Resolution: A) Cycle power on the node.
B) Reinstall the operating system.
C) Replace the node motherboard.
Table Continued
292
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 36, subcode 0x2 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: SMI not fully handled.
The BIOS was not able to determine the actual cause of the
triggered SMI.
Resolution: A) Cycle power on the node.
B) Reinstall the operating system.
C) Replace the node motherboard.
Fatal error: Code 36, subcode 0x3 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
--- SMI: No known cause (# 4097)
GPE status: 0x400000, GPE input: 0x0xfff7ff
*** Error: SMI: No known cause is too frequent
This error may result if there is an unknown hardware device
triggering SMIs in the system and those SMIs are happening
too frequently. Most likely the device continues to trigger
an SMI because its problem has not been serviced, and no
real work is possible at this point because immediately
after returning from the SMI, another is triggered. The
BIOS attempts to recognize this condition and stop with a
fatal error rather than just continuing to display errors.
Resolution: A) Remote reset or cycle power on the node.
B) Reinstall the operating system.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
293
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 36, subcode 0x4 (0)
Description
FATAL_SMI_ERROR "Fatal SMI Error"
*** Warning: SMI cause is too frequent; disabling SMI
handling
*** Error: SMI cause could not be masked
This error may result if a known SMI cause is happening too
frequently. In a normally functioning node, SMIs should
occur
infrequently, as there is a performance impact associated
with
handling each SMI.The BIOS will first attempt to disable
known
SMIs in order to mask this problem. If that is insufficient,
the BIOS will stop with this fatal error.
Resolution: A) Check for CPU memory DIMM correctables in the
event log. Replace DIMMs if they are suspect.
B) Check for hardware oscillating events in the event
log (such as PS status). On some node types, board
GPIO changes are reported through SMI. You may need
to replace power supplies or another FRU.
C) Replace the node motherboard.
Diagnostic: A) Set "fatal_no_reboot" at Whack and then
enter Whack
at the Fatal Error.You should be able to inspect
the state of the machine prior to SMI handling to
see what status is asserted. Output from the following
Whack commands may be helpful:
1) eagle status
2) vsc status
3) pci status
4) mem bridge
Fatal error: Code 36, subcode 0x5 (0)
FATAL_SMI_ERROR "Fatal SMI Error"
*** Error: In SMI on CPU ww [xx], CR2 was 0xyyyy,
but got changed to 0xzzzz
This error will result if the BIOS inadvertently changes
the contents
of CR2 while processing a SMI.This should not happen in
normal
operation, but might happen as the result of a `whack'
command.
As returning from this SMI could easily cause corruption of
the OS
or of a user-level program, this fatal error is flagged
instead.
Resolution: A) Cycle power on the node.
Table Continued
294
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode zz (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
Code 37 sub-codes are a bitmask of error values.
This means you may find an error which will simultaneously
trigger multiple GEVENTs. This event is probably one of the
hardest to interpret as it often will indicate multiple
board
devices have detected a fatal error condition.In general,
it's much more convenient to look up the decoded error in
the
BIOS output of the idelog rather than manually decoding this
event back to indicators.
Resolution: Look up each individual documented sub-code
below which when OR'd together form the sub-code observed.
Fatal error: Code 37, subcode 0x1 (0)
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x01
CMIC_FATAL (GEVENT0)
This error indicates the CMIC (North Bridge) had a fatal
error.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[0]: PCI2_PERR_L
This error indicates either the PLX #2 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #2 detected a parity error.
These
components manage PCI slots 0, and 1 on T-Series and Slot 0
on F-Series.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[0]: PEX2_FATAL_ERROR
This error indicates that the PLX #2 PCIe-PCIe bridge
detected
a fatal error.These components manage PCI slots 0, 1, and 2;
Harrier 1 and 2 LPC0.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove any recently installed PCI cards.
D) Remove all PCI cards.
E) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
295
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x2 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x02
ALERT (GEVENT1)
Error in PCI device 00.00.00 (CMIC-LE Memory Controller/
Thin IMB):
ESR (0x4c) [0004]: IMBus error
(PCI errors not cleared)
The output above can be considered "typical" but really may
contain any of the possible CMIC (North Bridge) Memory
Controller
or other PCI bus errors. An IMBus error indicates a
communication
problem between the North Bridge and one of the South
Bridge or
CIOBX2. This would likely indicate a node motherboard
failure.
It has been observed in the field that a flaky or bad PCI
socket
may also cause this.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove any recently installed PCI cards.
D) Remove all PCI cards.
E) Replace the node motherboard.
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[1]: MCH Fatal Error
This error indicates the MCH (North Bridge) has detected a
fatal
condition.Most likely there are other error messages present
in the idelog to help pinpoint the issue. Since the MCH is
the
top of the root complex, it's very common to see the MCH
indicating
Fatal error on nearly all failures.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs if no other error is indicated.
C) Replace the node motherboard.
Table Continued
296
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x4 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
--- SMI: smm_inb(0x39) == 0x04
GPE 2 triggered
THERMT_L0_OSB (GEVENT2)
This indicates a thermal event triggered a GPIO interrupt.
It is a fatal condition on Pentium III nodes, and the node
will be immediately taken out of the cluster with this
fatal error.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x04
GPE 2 triggered
P0_PROC_HOT (GEVENT2)
The Pentium 4 CPU supports clock modulation which reduces
the
core frequency when the core temperature is too high. The
BIOS
enables this support when starting the OS, so after the node
has joined the cluster, the BIOS will asynchronously notify
the
OS if this event occurs but not take it out of the cluster.
At
the same time, the Pentium 4 processor will automatically
reduce its clock speed so as to generate less heat and not
reach a shutdown temperature. This message is therefore not
fatal on P4 CPUs.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[2]: PCI0_PERR_L
This error indicates either the PLX #0 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #0 detected a parity error.
These
components manage PCI slots 4, and 5 on T-Series and Slot 2
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[2]: PEX0_FATAL_ERROR
Table Continued
Error codes—HPE 3PAR OS 3.2.2
297
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
This error indicates that the PLX #0 PCIe-PCIe bridge
detected
a fatal error.These components manage PCI slots 6, 7, and 8;
Harriers 1 and 2 LPC2.
See Code 37, sub-code 0x1 for resolution information.
Chimera nodes:
*** Error: GPE[2]: PEX0_FATAL_ERROR
This error indicates that the PLX 8796 #0 or #1 PCIe-PCIe
bridge detected
a fatal error.These components manage PCI slots 0, 1, 5 and
6;
Harrier 0, LPC0 and LPC2; and Harrier 1, LPC0 and LPC2.
See Code 37, sub-code 0x1 for resolution information.
Eos and Tornado nodes:
*** Error: GPE[2]: PEX_FATAL_ERROR
This error indicates that the PLX PCIe-PCIe bridge detected
a fatal error.These components manage PCI slots 0, 1, and 2;
Harrier LPC0 and LPC2.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
298
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x8 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
--- SMI: smm_inb(0x39) == 0x08
GPE 3 triggered
THERMT_L1_OSB (GEVENT2)
This indicates a thermal event triggered a GPIO interrupt.
See Code 37, sub-code 0x2 for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x08
GPE 3 triggered
P1_PROC_HOT (GEVENT2)
This indicates a thermal event triggered a GPIO interrupt.
See Code 37, sub-code 0x2 for resolution information.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[3]: PCI0_SERR_L
This error indicates either the PLX #0 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #0 detected a fatal error
(SERR).
These components manage PCI slots 4, and 5 on T-Series and
Slot 2
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[3]: PEX1_FATAL_ERROR
This error indicates that the PLX #1 PCIe-PCIe bridge
detected
a fatal error.These components manage PCI slots 3, 4, and 5;
Harrier 1 and 2 LPC1.
See Code 37, sub-code 0x1 for resolution information.
Chimera nodes:
*** Error: GPE[3]: PEX1_FATAL_ERROR
This error indicates that the PLX 8750 PCIe-PCIe bridge
detected
a fatal error. This component manages PCI slots 2, 3, and 4;
Harrier 0, LPC1; and Harrier 1 LPC1.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
299
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x10 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
GPE 4 triggered
MIRQ (GEVENT4)
This error indicates the memory controller (CNB20HE)
triggered
an interrupt. The CNB20HE documentation lists possible
sources
as correctable ECC error on Memory data bus and Processor
data bus.
See below (P4) for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x10
GPE 4 triggered
P0_IERR (GEVENT4)
This error indicates that P4 CPU 0 has asserted IERR#,
which is
used to indicate a processor internal error event occurred.
The Intel documentation indicates one cause of this error
is a
machine check exception when exceptions have not yet been
enabled. From our experience in the field, the problem is
possibly a CPU or node motherboard failure.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove any recently installed PCI cards.
D) Remove all PCI cards.
E) Replace the node motherboard.
Diagnostic: A) Replace CPUs.
B) Replace CPU VRMs.
C) Check DIMM termination (R5836 etc on P4-Eagle nodes).
T-Series and F-Series (5000P) nodes:
*** Error: GPE[4]: PCI1_PERR_L
This error indicates either the PLX #1 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #1 detected a parity error.
These
components manage PCI slots 2, and 3 on T-Series and Slot 1
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[4]: FPGA_LPC_IRQ0_L
This error indicates an internal error. This should not
occur
in a V-Series system.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
300
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x20 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x20
GPE 5 triggered
P1_IERR (GEVENT5)
This error indicates that P4 CPU 1 has asserted IERR#.
See Code 37, sub-code 0x10 (P4) for resolution information.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[5]: PCI1_SERR_L
This error indicates either the PLX #1 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #1 detected a fatal error
(SERR).
These components manage PCI slots 2, and 3 on T-Series and
Slot 1
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P), Eos, Tornado and Chimera
nodes:
*** Error: GPE[4]: FPGA_LPC_IRQ1_L
This error indicates that NEMOE raised the FPGA SMI
interrupt
and it was not handled properly.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
301
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x40 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x40
GPE 6 triggered
P_SERR (GEVENT6)
This error indicates one or more of the system's chipset is
asserting P_SERR (primary side system error). Output is
usually followed by outstanding PCI errors as indicated by
chipset devices.
Resolution: A) Identify and replace failing PCI card based
on
error output. It may be necessary to contact
hardware engineering with BIOS output to determine
which PCI slot is at fault.
B) Remove all PCI cards.
C) Replace the node motherboard.
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[6]: MCH Uncorrectable Error
This error indicates the MCH (North Bridge) has detected an
uncorrectable error. Most likely there are other error
messages
present in the idelog to help pinpoint the issue. Since
the MCH
is the top of the root complex, it's very common to see the
MCH
indicating Uncorrectable error on nearly all failures.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs if no other error is indicated.
C) Replace the node motherboard.
Table Continued
302
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x80 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x39) == 0x80
GPE 7 triggered
P_PERR (GEVENT7)
This error indicates one or more of the system's chipset is
asserting P_PERR (primary side parity error).
See Code 37, sub-code 0x40 for resolution information.
T-Series and F-Series (5000P) nodes:
*** Error: GPE[7]: PCI2_SERR_L
This error indicates either the PLX #2 PCIe-PCIX bridge or
the
Intel 31154 PCIX-PCIX brige #2 detected a fatal error
(SERR).
These components manage PCI slots 0, and 1 on T-Series and
Slot 0
on F-Series.
See Code 37, sub-code 0x1 for resolution information.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[7]: Not connected
This error indicates an internal error. This should not
occur
in a V-Series system.
See Code 37, sub-code 0x1 for resolution information.
Eos, Tornado, and Chimera nodes:
*** Error: GPE[7]: MCH Fatal Error
This error indicates the MCH (North Bridge) has detected a
fatal
condition.Most likely there are other error messages present
in the idelog to help pinpoint the issue. Since the MCH is
the
top of the root complex, it's very common to see the MCH
indicating
Fatal error on nearly all failures.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs if no other error is indicated.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
303
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x100 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
--- SMI: smm_inb(0x3a) == 0x01
GPE 8 triggered
CPU_TEMP_INTR (GEVENT8)
This indicates a CPU temperature event triggered a GPIO
interrupt.
See Code 37, sub-code 0x2 for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x3a) == 0x01
GPE 8 triggered
S_SERR (GEVENT8)
This error indicates one or more of the system's chipset is
asserting S_SERR (secondary side system error).
See Code 37, sub-code 0x40 for resolution information.
T-Series and F-Series (5000P) nodes:
--- SMI request via EXT_SMI
This error indicates another node in the cluster has forced
this node to handle an SMI. Most likely the other node is
attempting to force a panic dump because the local node has
stopped responding.
Resolution: A) Inspect the core dump to determine if the
cause was a software or hardware failure.
B) Replace the node motherboard if the issue
recurs and can not be identified as a software
failure.
V-Series, Atlas, Minime (5000P) nodes:
*** Error: GPE[7]: Not connected
This error indicates an internal error. This should not
occur
in a V-Series system.
See Code 37, sub-code 0x1 for resolution information.
Table Continued
304
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x200
Description
GEVENT_TRIGGERED"GEvent Triggered"
S-Series (PIII) nodes:
This error indicates one or more of the system's chipset is
asserting SERR (system error).Output is followed by the
PCI scan results, which displays outstanding PCI errors of
all PCI bus devices.
See below (P4) for resolution information.
S-Series and E-Series (P4) nodes:
--- SMI: smm_inb(0x3a) == 0x02
GPE 9 triggered
S_PERR (GEVENT8)
This error indicates one or more of the system's chipset is
asserting S_PERR (secondary side parity error).
Resolution: A) Identify and replace failing PCI card based
on
error output. It may be necessary to contact
hardware engineering with BIOS output to determine
which PCI slot is at fault.
B) Remove all PCI cards.
C) Replace the node motherboard.
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[9]: CPU0 IERR_L
This error indicates that CPU 0 has asserted IERR#, which is
used to indicate a processor internal error event occurred.
The Intel documentation indicates one cause of this error
is a
machine check exception when exceptions have not yet been
enabled. From our experience in the field, the problem is
possibly a CPU or node motherboard failure.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove all PCI cards.
D) Replace the node motherboard.
Diagnostic: A) Replace CPUs.
B) Replace CPU VRMs.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
305
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 37, subcode 0x400 (0)
Description
GEVENT_TRIGGERED"GEvent Triggered"
T-Series, F-Series, V-Series (5000P) nodes:
*** Error: GPE[10]: CPU1 IERR_L
This error indicates that CPU 1 has asserted IERR#, which is
used to indicate a processor internal error event occurred.
See Code 37, sub-code 0x200 for resolution information.
Chimera nodes:
*** Error: GPE[10]: CPU1_THERMTRIP_L
This indicates a thermal event on CPU1 triggered a GPIO
interrupt.
It is a fatal condition and the node will be immediately
taken out of the cluster with this fatal error.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
Fatal error: Code 37, subcode 0x800 (0)
GEVENT_TRIGGERED"GEvent Triggered"
Chimera nodes:
*** Error: GPE[11]: CPU0_THERMTRIP_L
This indicates a thermal event on CPU0 triggered a GPIO
interrupt.
It is a fatal condition and the node will be immediately
taken out of the cluster with this fatal error.
Resolution: A) Cycle power on the node. If it is a
temperature
related problem, verify the system is getting
adequate ventilation.
B) Replace the node motherboard.
Eos and Tornado nodes:
*** Error: GPE[11]: THERMTRIP_L
This indicates a thermal event on the CPU triggered a GPIO
interrupt.
See the above information regarding Chimera nodes for
resolution.
Table Continued
306
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 37, subcode 0x2000 (0)
GEVENT_TRIGGERED"GEvent Triggered"
Eos, Tornado and Chimera nodes:
*** Error: GPE[13]: CAT_ERR_L
This error indicates that a CPU has asserted IERR#, which is
used to indicate a processor internal error event occurred.
The Intel documentation indicates one cause of this error
is a
machine check exception when exceptions have not yet been
enabled. From our experience in the field, the problem is
possibly a CPU or node motherboard failure.
Resolution: A) Cycle power on the node.
B) Verify the system is getting adequate ventilation.
C) Remove all PCI cards.
D) Replace the node motherboard.
Diagnostic: A) Replace CPUs.
B) Replace CPU VRMs.
Non-fatal error: Code 38,
sub-code 0x0 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power Supply xx indicates invalid battery configuration: y
batteries
Verify battery connection and individual battery units.
The maximum count of batteries in a string which are
supported
by software is 3. Any greater number will result in this
non-fatal error.
The data value may be decoded to determine which power
supply
and the battery count.The high 8 bits are a bitmask of the
power supply. The lower 16 bits are the number of batteries
counted. Thus, a data value of 100000c indicates PS1 had a
battery count of 12. A data value of 4 indicates PS0 had a
battery count of 4.
Resolution: A) Verify no more than 3 batteries in a string
are connected to any one power supply.
B) Cycle power on the node.
C) Remove batteries one at a time to determine
if there is a faulty connection or battery.
Replace the faulty cable or battery.
D) Replace the power supply.
E) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
307
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 38,
sub-code 0x1 (0)
POWER_SUPPLY_FAILURE"Power Supply Failure"
RTC / NVRAM Battery Failure - Replace battery.
The RTC / NVRAM battery was found to have a low voltage by
the
built-in monitoring circuit of the RTC (TOD clock).
Resolution: A) Replace the lithium-ion cell battery on the
node.
B) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x3 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
No batteries present on power supply xx
This error indicates no batteries were found on a node
power supply.
This warning may be enabled by setting "warn_nobat" in
NVRAM.
The data value may be decoded to determine which power
supply
triggered this error. The high 8 bits are a bitmask of the
power supply. Thus, a data value of 0 indicates PS0 is not
present. A data value of 1000000 indicates PS1 is not
present.
Resolution: A) Verify there is at least one battery
connected.
B) Cycle power on the node.
C) Exchange cables and batteries.
D) Replace the power supply.
E) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x4 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply missing: node power configuration is not
redundant
This error indicates one of the two power supplies for
a node is not present.This warning may be enabled by
setting "warn_ps" in NVRAM.
The data value may be decoded to determine which power
supply
triggered this error. The high 8 bits are a bitmask of the
power supply. Thus, a data value of 0 indicates PS0 is not
present. A data value of 1000000 indicates PS1 is not
present.
Resolution: A) Verify both power supplies are present and
powered on.
B) Power off the missing supply, remove it, and
re-insert it in the chassis.
C) Replace the power supply.
D) Replace the node motherboard.
Table Continued
308
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 38,
sub-code 0x5 (0)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Battery failure on Power Supply
This error indicates that a battery on the power supply
has reported a hardware error.The status light on the back
of the failed battery will be amber.
Resolution: A) Verify both power supplies are present and
powered on.Verify batteries are present and
powered on.
B) Power off the failed battery, remove the cable, and
re-insert it in the Power Supply. Turn it back on.
If that does not reset the FAILED condition,
replace the battery.
C) Replace the power supply.
D) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x6 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Powering off PSxx because it is on battery power.
This will shut down the node until AC is restored.
This message indicates that a power supply lost input AC
Power and
that the BIOS powered down the node to avoid draining the
battery.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
DC power
status. Bit 0 represents power supply 0 and Bit 1
represents power
supply 1. If this bit is 1, then the DC output from the
power
supply was good when the system shut down.
Resolution: A) Apply AC power to the node.
B) Replace the power supply.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
309
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 38,
sub-code 0x7 (data)
Description
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: Fan Bad
or
Power supply xx failure: Fan 0 Bad
or
Power supply xx failure: Fan 1 Bad
This error indicates there is a hardware problem in one of
the
node power supplies. One or more of the fans may have
failed.
The data value may be decoded to determine which power
supply
(and fan) triggered this error. The low 2 bits are a
bitmask of
the fan status for Power Supply 0.The next 2 bits are a
bitmask
of the fan status for Power Supply 1. Thus:
1: PS0 had a Fan0 failure 2: PS0 had a Fan1 failure
3: PS0 had a double fan failure c: PS1 had a double fan
failure
4: PS1 had a Fan0 failure 8: PS1 had a Fan 1 failure
Resolution: A) Replace the power supply.
B) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0x8 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: Charger Overload
This error indicates there is a hardware problem in one of
the
node power supplies, specifically that the charger cannot
handle
the battery charge current draw. If you need to override
this
error so the node continues, you can set
"ignore_chargefail" in
NVRAM.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
charger status for the two power supplies.This a value of 1
indicates PS0 had a charger overload. A value of 2
indicates
PS1 had a charger overload. A value of 3 indicates PS0 and
PS1
both had a charger overload.
Resolution: A) Check battery connection.
B) Exchange cables and batteries.
C) Replace the power supply.
D) Replace the node motherboard.
Table Continued
310
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 38, subcode 0x9 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Both Power Supplies failed: DC Output Bad
This error indicates there is a hardware problem in one of
the
node power supplies. If this failure is transient, it
could also
be caused by turning the power supply off and then on or by
a
quick AC loss followed by AC being restored. If both power
supplies fail simultaneously (not likely), this is a fatal
error.
The data value may be decoded to determine which power
supply
triggered this error. The low 2 bits are a bitmask of the
DC Output status for the two power supplies.
As a Fatal error, the value will be 3, indicating PS0 and
PS1
both had a DC Output Bad.
Resolution: A) Ensure a service operation was not taking
place
at the time, and that AC had not also failed.
B) Replace the power supply.
C) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
311
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 38,
sub-code 0xa (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Power supply xx failure: AC Input Bad
This error indicates that AC input power is not being
supplied to
one or more power supplies. The likely cause is either a
real AC
Failure or that the power supply has been switched to the
off
position. In the case of an AC Failure, the power supply
will be
automatically shut down to preserve batteries (if
"ignore_acfail" is
set then the power supply will not be shut down).
The lower 2 bits of the data value may be decoded to
determine
which power supply lost AC power. A value of 1 indicates
PS0.
A value of 2 indicates PS1. A value of 3 indicates both
power
supplies lost AC power.
Resolution: A) Verify AC power is present and the power
supply switch is turned on.
B) Check the Power Distribution Unit (PDU) breaker.
C) Replace the power supply.
D) Replace the node motherboard.
Non-fatal error: Code 38,
sub-code 0xb (0)
POWER_SUPPLY_FAILURE"Power Supply Failure"
**** Power Supplies mismatch ****
Power Supply 0: I2C accessible
Power Supply 1: I2C inaccessible
This error indicates one of the power supplies is a new
style
(I2C interface) and the other power supply is not responding
using I2C, but has been detected as present. This is not a
supported configuration. If you need to override this
error,
set "ignore_psdiff" in NVRAM.
Resolution: A) Pull and re-insert the inaccessible power
supply.
B) Check the Power Distribution Unit (PDU) breaker
for the inaccessible power supply.
C) Replace the power supply.
D) Replace the node motherboard.
Table Continued
312
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 38,
sub-code 0xc (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
This error indicates Power Supply 0 reported a limit was
exceeded
while performing the power supply status test.Each power
supply
has integrated monitors for temperature, voltage, and
current
draw. The BIOS reads these sensors as part of
initialization to
determine if the power supply is operating within
specifications.
The data value may be decoded to determine the particular
cause
of the limit failure. Each bit represents a unique sensor.
Data
values may be decoded as follows:
00000001 - Temperature
00000004 - 3.3V
00000008 - 3.3V Current
00000010 - 5V
00000020 - 5V Current
00000040 - 12V
00000080 - 12V Current
00000100 - 24V
00000200 - 24V Current
00000400 - 48V
00000800 - 48V Current
00001000 - Bat0 48V
00002000 - Bat1 48V
00004000 - Bat2 48V
00008000 - Bat0 12V
00010000 - Undefined ... to ...
00400000 - Undefined
00800000 - Battery LED is Amber
01000000 - Battery Relay is Off
02000000 - PS LED is Amber
04000000 - Fan Fail
08000000 - DC Fail
10000000 - AC Fail
20000000 - Power Supply is Disabled
40000000 - Power Supply Switch is Off
80000000 - Low Limit exceeded (combined with bits above)
Resolution: Contact 3PAR technical support.
Non-fatal error: Code 38,
sub-code 0xd (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
This error indicates Power Supply 1 reported a limit was
exceeded
while performing the power supply status test.
See Code 38, sub-code 0xc for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
313
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 38,
sub-code 0xe (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
Each newer generation (Magnetek) power supply and battery
has an
I2C interface which allows the node to acquire power supply
internal temperature, voltages, and current loads.The BIOS
will
verify these readings are within acceptable limits as part
of
normal initialization.
This failure code indicates a limit has been exceeded on a
battery
attached to a power supply on the node. The data value may
be
decoded to determine which power supply and battery. The
lower
2 bits are a bitmask of the power supply. The upper 16
bits are
a bitmask of the failing battery. Thus, a data value of
10002 indicates PS1 Bat0 has exceeded a limit.A data value
of
40001 indicates PS0 Bat2 has exceeded a limit.
Resolution: A) Check battery expiration date and replace as
necessary.
B) Power cycle the failing battery.
C) Replace battery cable.
Diagnostic: A) Use the Whack "bat status" command to display
power supply and battery temperatures and
voltages to determine the particular failure.
Non-fatal error: Code 38,
sub-code 0xf (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
I2C errors prevented completion of the power test.
Each newer generation (Magnetek) power supply and battery
has an
I2C interface which allows the node to acquire power supply
status.
This failure codes indicates the BIOS was unable to read
one of
the Power Supply or battery status registers.
The lower 2 bits of the data value may be decoded to
determine
which power supply failed.A value of 1 indicates PS0. A
value
of 2 indicates PS1. A value of 3 indicates both power
supplies
failed.
Resolution: A) Power cycle the indicated power supply.
B) Replace power supply.
C) Replace all attached batteries to the power supply.
D) Replace the node motherboard.
Table Continued
314
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 38,
sub-code 0x10 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
PSwwww Batxxxx Switch Off
This failure code indicates a battery has its power switch
in the
off position, and is thus unable to supply back up power to
the
node in the case of AC Failure. The data value may be
decoded to
determine which power supply and battery. See Code 38, subcode 0xd
for decoding information.
Resolution: A) Turn battery on.
B) Power cycle the indicated battery.
C) Replace battery cable.
D) Replace power supply.
Fatal error: Code 38, subcode 0x11 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
PS x has down-rev firmware (x)
This failure code indicates the power supply firmware
revision
is not up-to-date and therefore not supported.
Resolution: Replace power supply.
Fatal error: Code 38, subcode 0x12 (data)
POWER_SUPPLY_FAILURE"Power Supply Failure"
PS x Battery has down-rev firmware (rev)
This failure code indicates the battery attached to the
power supply
indicated has firmware that is not up-to-date and therefore
not
supported.
Resolution: Replace battery.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
315
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 39, subcode 0x1 (0)
Description
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for no successful OS boot (xxxx) exceeded.
Type "unset cnt_no_os_boot" to clear this error
This error indicates that the BIOS has detected that the
node has not successfully booted the OS and will now
prohibit boots until operator intervention clears this
error.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_no_os_boot" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Fatal error: Code 39, subcode 0x2 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for OS boot with no cluster (xxxx) exceeded.
Type "unset cnt_no_cluster" to clear this error
This error indicates that the BIOS has detected that the
node has booted, but the cluster has not successfully formed
several times.The BIOS will prohibit boots until operator
intervention clears this error. This is to prevent cyclic
node up/down caused by a hardware or software failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_no_cluster" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Table Continued
316
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 39, subcode 0x3 (0)
Description
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for OS panic (xxxx) exceeded.
Type "unset cnt_os_panic" to clear this error
This error indicates that the BIOS has detected that the
node has booted and then caused a panic several times.
When the OS causes a panic, it notifies the BIOS of this
event, so the BIOS can track problems.Once a limit is
exceeded, the BIOS will prohibit boots until operator
intervention clears this error. This is to prevent cyclic
node up/down caused by a hardware or software failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_os_panic" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Fatal error: Code 39, subcode 0x4 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for OS cluster without shutdown (xxxx)
exceeded.
Type "unset cnt_no_shutdown" to clear this error
This error indicates that the BIOS has detected that the
node has booted, but has not been shut down properly
several times.The BIOS will prohibit boots until operator
intervention clears this error. This is to prevent cyclic
node up/down caused by a hardware or software failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_no_shutdown" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
317
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 39, subcode 0x5 (0)
Description
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for same fatal error (xxxx) exceeded.
Type "unset cnt_same_fatal" to clear this error
This error indicates that the BIOS has detected that the
same fatal or non-fatal error has occurred repeatedly.
The BIOS will prohibit boots until operator intervention
clears this error.This is to prevent cyclic node up/down
caused by a hardware or software failure. This increases
the reliability of the cluster by preventing the node from
continuously attempting to join the cluster.
Resolution: A) Observe other errors present in the PROM log
to determine the cause of this error.
B) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_same_fatal" at
a Whack prompt.
B) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
C) Replace the IDE drive.
Fatal error: Code 39, subcode 0x6 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Maximum count for errors logged (xxxx) exceeded.
Type "unset cnt_log_error" to clear this error
This error indicates that the BIOS has detected that it
has recorded too many fatal or non-fatal errors in the board
serial PROM and that it should prohibit further boots until
operator intervention clears this error. This is to prevent
cyclic node up/down caused by a hardware or software
failure.
This increases the reliability of the cluster by preventing
the node from continuously attempting to join the cluster.
Resolution: A) Observe other errors present in the PROM log
to determine the cause of this error.
B) Clear this error as suggested in the error
text. You may also turn off this checking
mechanism if it does not meet your application.
To do this, type, "unset max_log_error" at
a Whack prompt.
C) Verify that a valid operating system image is
installed on the node's internal disk. Reinstall
the operating system if defective.
D) Replace the IDE drive.
Table Continued
318
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 39, subcode 0x7 (0)
OS_STARTUP_FAILURE "OS Startup Error"
This node hit the Harrier Mismatch Error, failed ECC
This error indicates that the BIOS has detected that the
Harrier
ASIC's ECC logic has hit an error.It should have triggered
an
ECC error, but failed to do so.
Resolution: Replace the node motherboard.
Fatal error: Code 39, subcode 0x10 (0)
OS_STARTUP_FAILURE "OS Startup Error"
Invalid boot sector. Use "boot net install" to correct
this.
The IDE disk is used for booting the operating system.
This error indicates the boot sector which has been
loaded from the disk does not have a valid signature.
The most likely cause of this error is that a fresh
IDE drive has been installed in the node and it needs
to be field net installed.
Disk MBR does not have a valid partition table
You may also see the above line immediately following
the fatal error. This message indicates the partition
table in the boot sector (Master Boot Record) was
also invalid, and that a "ide log" entry could not be
written.
Resolution: A) If no hardware has been replaced, first
try cycling power on the node.
B) Perform a field IDE net install on the
drive, or use "boot net install".
C) Use the "ide smart status" to acquire the drive
SMART status. Replace the IDE drive if a
failure is reported.
C) Replace the IDE cable.
D) Replace the IDE drive.
E) Replace the node motherboard.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
319
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 40,
sub-code 0x1 (0)
CBIOS_OS_TIMEOUT"CBIOS to OS timeout"
*** Error: CBIOS to OS message communication timeout
During CPU SMI initialization, the queue facility to send
messages between the BIOS and TPD is tested. If there is
a problem triggering an SMI, or some other error which
causes message corruption, this error will result.This
error is recoverable because the OS can still come up and
function at a degraded level even if the communication
between the OS and BIOS is not functioning.
Resolution: A) View prom log to see if this is repeatable.
If not, ignore a single occurrence.
B) Cycle power on the node.
C) Replace the bootstrap CPU.
D) Replace the node motherboard.
Fatal error: Code 41, subcode 0x0 (0)
CPU_BUS_SPEED_BAD "CPU Bus Speed Bad"
*** Error: CPU speed is too slow.
The computed CPU speed is lower than the expected minimum
supported in a 3PAR node. Most likely this is due to a
hardware failure. Since the CPU speed computation depends
upon access to the RTC, it is most likely there is a
communication problem with the SuperIO containing the RTC.
If you need to run with a reduced CPU speed, enter the
following command on the node:
Whack> set perm cpu_slow_ok
See Code 41, sub-code 0x0 for resolution information.
Fatal error: Code 41, subcode 0x1 (0)
CPU_BUS_SPEED_BAD "CPU Bus Speed Bad"
*** Error: Memory speed is too slow.
After the CPU speed is computed, the memory bus (FSB)
speed is computed.It is computed based on the CPU
speed, and bus speed multiplier as reported by the CPU.
If you need to run with a reduced Memory bus speed, enter
the following command on the node:
Whack> set perm mem_slow_ok
Resolution: A) Cycle power on the node.
B) Replace the bootstrap CPU.
C) Replace the node motherboard.
Diagnostic: A) Resume past fatal error and look for
additional problems such as RTC failure.
Table Continued
320
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 42, subcode 0x1 (0)
Description
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed CP PROM ww.xx.yy.zz read
Centerpanel access using Manufacturing PROM: FAILURE
The centerpanel is used by the 3PAR cluster for the nodes to
communicate. The CM links and backup serial links serve
this purpose. There is also a diagnostic I2C bus present in
the centerpanel which is used by nodes to diagnose error
conditions and reset other nodes in the cluster.
As part of the manufacturing process, this bus is tested by
accessing the serial PROM which is present on a
manufacturing
centerpanel. If this test fails, it is likely the node will
have a problem accessing the centerpanel I2C bus.
Resolution: A)
B) Replace the
Diagnostic: A)
such as the
Fatal error: Code 42, subcode 0x2 (0)
Cycle power on the node.
node motherboard.
Use the Whack "i2c" command to access devices
board register directly.
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed CP PROM ww.xx.yy.zz write
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Fatal error: Code 42, subcode 0x3 (0)
CP_I2C_FAILURE "Centerpanel I2C Failure"
CP PROM node data does not match what is written: Addr xxxx
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Fatal error: Code 42, subcode 0x4 (0)
CP_I2C_FAILURE "Centerpanel I2C Failure"
CP PROM pattern data read is incorrect
Addr xx Expected yy
Read zz
...
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Fatal error: Code 42, subcode 0x5 (0)
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed I2C access to board register x.y.z
Centerpanel access using Manufacturing PROM: FAILURE
See Code 42, sub-code 0x1 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
321
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 42, subcode 0x6 (0)
Description
CP_I2C_FAILURE "Centerpanel I2C Failure"
Failed I2C access to board register x.y.z
Centerpanel access using Manufacturing PROM: FAILURE
Titan specific. It does read accessibility check for extra
I2C
addresses while testing CP PROM 0.a0 and fails with fatal
error
message if the address is not accessible.
Note that if the failure is not related to CP PROM 0.a0, it
will
not print "CP PROM at 0.a0:" message and
only "Failed I2C access to board register x.yy".
See Code 42, sub-code 0x1 for resolution information.
Table Continued
322
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 43, subcode 0x0 (data)
Description
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Voltage ID indicates CPUxx present but TEMP sensor
disagrees.
This error indicates either a CPU failure or onboard sensors
are reading incorrect values for the specified CPU.
The VID (voltage ID sense) lines are attached to each
physical
CPU and used to indicate to the VRMs (voltage regulator
modules)
the voltage level expected by the CPU.These lines are also
connected to the LM87 which use this to determine the
correct
voltage which should be delivered to the CPU.
The TEMP (temperature) sensor is connected to an on-die CPU
thermal diode.If its reading is out of acceptable range,
the BIOS determines the sensor is not reliably connected to
a CPU, or a CPU is not present.
Bits 0-1 of data indicate CPU non-presence as determined
by the VID sense lines. Bits 8-9 of data indicate CPU nonpresence as determined by connection to the thermal diode.
Data ValueFailure
------------------------------------------------------------1CPU0 does not respond to startup
2CPU1 does not respond to startup
10CPU0 thermal sensor/voltage ID indicates not present
20CPU1 thermal sensor/voltage ID indicates not present
Resolution: A) Cycle power on the node.
B) Remove physical CPU from specific socket and
test with no CPU present.
B1) If error persists, replace node motherboard.
B2) If error clears, replace CPU.
C) Replace the node motherboard.
Diagnostic: A) Use "i2c env" command to determine whether
the temperature or voltage is at fault.
B) If CPU temperature shows out of range, and
CPU is still functional, suspect thermal diode
connection to LM87.Try swapping CPUs to see
if problem moves with CPU.
C) If CPU voltage shows high or low, but VRM is
emitting correct voltage by the voltage sensor,
then suspect the VID lines to the LM87.
Fatal error: Code 43, subcode 0x1 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Voltage ID indicates CPUxx not present but TEMP sensor
disagrees.
See Code 43, sub-code 0x0 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
323
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 43, subcode 0x2 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Physical CPUxx active, but thermal sensor disagrees
Bits 0-1 of data indicate CPU non-presence as determined
by the running CPU APIC addresses.Bits 8-9 of data
indicate CPU non-presence as determined by connection to
the thermal diode.
See Code 43, sub-code 0x0 for resolution information.
Fatal error: Code 43, subcode 0x3 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Physical CPUxx not active, but thermal sensor disagrees
Bits 0-1 of data indicate CPU non-presence as determined
by the running CPU APIC addresses.Bits 8-9 of data
indicate CPU non-presence as determined by connection to
the thermal diode.
See Code 43, sub-code 0x0 for resolution information.
Fatal error: Code 43, subcode 0x4 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Not all hyper-threads started on physical CPUxx
Bits 0-1 of data
physical CPU0 as
addresses.
Bits 2-3 of data
physical CPU1 as
addresses.
indicate logical CPU non-presence in
determined by the running CPU APIC
indicate logical CPU non-presence in
determined by the running CPU APIC
See Code 43, sub-code 0x0 for resolution information.
Fatal error: Code 43, subcode 0x5 (data)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
Not all cores started on physical CPUxx
Bits 0-3 of data
physical CPU0 as
addresses.
Bits 4-7 of data
physical CPU1 as
addresses.
indicate logical CPU non-presence in
determined by the running CPU APIC
indicate logical CPU non-presence in
determined by the running CPU APIC
See Code 43, sub-code 0x0 for resolution information.
Table Continued
324
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 43, subcode 0x10 (xx)
CPU_PRESENCE_FAILURE"CPU Presence Failure"
CMIC heatsink disconnected: yy
The GPIOs reporting proper connection of the CMIC (North
Bridge) heatsink report a loss of connection. This is a
board failure which requires a lab technician to reattach
the heatsink.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Diagnostic: A) Visually inspect the CMIC heatsink posts to
determine if it needs to be reattached.
B) Observe the reported xx value to see if it is
one or both GPIOs which report the failure.
These lines may be traced from VSC055 GPIOs
P3.1 (J1300) and P3.2 (J1301).
C) The BIOS flag "ignore_hsfail" may be set to
override checking the CMIC heatsink.
Non-fatal error: Code 44,
sub-code 0x00 (xx)
NODE_FAN_FAILURE"System Fan Failure"
*** Error: One of the node fans is not present, failed,
or is unintentionally running at a slower speed than
expected.
The VSC055 reports tachometer inputs for both node fans,
0 and 1. This is a single node fan failure which requires
the fan to be replaced.
Resolution: A) Cycle power on the node.
B) Replace the node fan.
Diagnostic: A) Visually inspect the node fan.
B) Observe the fan is present and connected properly.
C) If it was misconnected, correct the connection.
Otherwise, the fan needs to be replaced.
Fatal error: Code 44, subcode 0x01 (xx)
NODE_FAN_FAILURE"System Fan Failure"
*** Error: Both of the node fans are not present, failed,
or are unintentionally running at speeds slower than
expected.
The VSC055 reports tachometer inputs for both node fans,
0 and 1. This is a dual node fan failure which requires
both of the fans to be replaced. The system may overheat.
Resolution: A) Cycle power on both nodes.
B) Replace both node fans.
Diagnostic: A) Visually inspect the node fans.
B) Observe the fans are present and connected properly.
C) If they are misconnected, correct the connections.
Otherwise, the fans need to be replaced.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
325
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 45, subcode 0x00 (data)
QLIS_ISCSI_FAILURE "QLogic iSCSI Failure"
*** Error: QLogic iSCSI Failure
This error code indicates an error while running the QLogic
iSCSI POST.
Failed Test (bits 8-15), Slot (bits 4-7) and Port (bits
0-3) are
packed into data.
Failed Test is one of the following:
<QLogic internal card diagnostics>
2
3
4
5
6
7
8
9
240
241
242
243
244
Test Local RAM Size
Test Local RAM R/W
Test RISC RAM
Test NVRAM
Test Flash ROM
Test Network Internal Loopback
Test Network External Loopback
Test DMA Transfer
(0xf0) Test NOP
(0xf1) Test Registers
(0xf2) Test DMA Transfer to CPU memory
(0xf3) Test DMA Transfer to Cluster memory
(0xf4) Card Initialization
Resolution: A) Cycle power on failing node.
B) Re-seat failing iSCSI card
C) Replace failing iSCSI card
Fatal error: Code 46, subcode 0x1 (0)
BAD_OR_UNKNOWN_CHIPSET
"Bad or Unknown Chipset"
*** Error: Unrecognized chipset (0xXXXXXXXX).
This error code indicates CBIOS does not recognize
the chipset installed on the node's motherboard.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Non-fatal error: Code 46,
sub-code 0x2 (0)
BAD_OR_UNKNOWN_CHIPSET
"Bad or Unknown Chipset"
*** ME not in operational mode. IPMI data unavailable.
This error code indicates that the PCH Management Engine is
not
in the desired operational mode in PCH chipset. IPMI
temperature
data is not available in this mode and the systems fans may
not
run at the proper speed and may not cool the enclosure.
Resolution: Contact engineering with data.
Table Continued
326
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 47, subcode 0x00 (0)
SDRAM_UNAVAILABLE"Control Cache Unavailable"
No CPU SDRAM is available.
This error indicates that CBIOS has no working
CPU memory available for it to continue with POST and
ultimately boot the node.
Resolution: A) Cycle power on the node.
B) Replace CPU DIMMs.
C) Replace the node motherboard.
Fatal error: Code 48, subcode 0x0 (XXXXXXXX)
UNKNOWN_BOARD "Unknown Board"
*** Error: Unrecognized board identifier (0xXXXXXXXX).
This error code indicates CBIOS does not recognize the
board type
for the chipset installed on the node's motherboard.
Resolution: A) Cycle power on the node.
B) Replace the node motherboard.
Fatal error: Code 49, subcode 0x1 (data)
USB_FAILURE "USB Flash Media Failure"
Failed to find USB device handle
or
Inquiry Request Failed rc = xxxx
The USB controller failed to perform a self test.
A data value of 0 indicates the BIOS failed to find a USB
handle.
Resolution: A) If a USB Flash drive is not expected to be
present,
set the "usb_nodevice_ok" NVRAM variable to override
BIOS requiring a USB Flash drive be found.
B) Replace the USB Flash drive.
C) Replace the node motherboard.
Diagnostic: A) Whack "usb test" commands may be used to
individually execute USB tests.
Fatal error: Code 49, subcode 0x4 (0)
USB_FAILURE "USB Flash Media Failure"
There was a USB failure in data requested by the operating
system bootstrap. It is possible that data on the disk has
become corrupt to the point the operating system will not
successfully load.
Resolution: Reinstall the operating system bootstrap with
the "boot net install" command.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
327
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 49, subcode 0x6 (0)
USB_FAILURE "USB Flash Media Failure"
USB reported a failure in the read verify command.
See Code 49, sub-code 0x1 for resolution information.
Fatal error: Code 49, subcode 0x7 (0)
USB_FAILURE "USB Flash Media Failure"
USB reported a failure in the write verify command.
See Code 49, sub-code 0x1 for resolution information.
Non-fatal error: Code 49,
sub-code 0x17 (0)
USB_FAILURE "USB Flash Media Failure"
No USB device was found.
Resolution: Install or replace the USB Flash drive.
Fatal error: Code 50, subcode 0x1 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Invalid control cache setup.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x2 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Incompatible FB-DIMM installed.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x3 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Electrically isolated FB-DIMM.
Resolution: A) Replace DIMM.
B) Replace node.
Fatal error: Code 50, subcode 0x4 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Incompactible module installed.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x5 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Mismatched DIMM pair.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x6 (<DIMM>)
SDRAM_INIT_WARNING
Odd rank disabled.
"Control Cache Init Failure"
Resolution: Replace DIMM.
Table Continued
328
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 50, subcode 0x7 (0)
Description
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM branch failed to train and lockstep mode has been
disabled.
Resolution: A) Replace all DIMMs.
B) Replace node.
Fatal error: Code 50, subcode 0x9 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM northbound merge has been disabled.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0xa (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM disabled due to lockstep skew.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0xb (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM rank disabled due to Built-in Self Test failure.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0xe (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Memory interleave range limit invalid.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0xf (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
High temp disabled.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x10 (<DIMM>)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Logical rank with CECC detected.
Resolution: Replace DIMM.
Fatal error: Code 50, subcode 0x12 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Sub-optimal FB-DIMM channel population detected.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x13 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Mismatched AMB pair.
Resolution: Replace all DIMMs.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
329
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 50, subcode 0x14 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM branch disabled.
Resolution: A) Replace all DIMMs.
B) Replace node.
Fatal error: Code 50, subcode 0x15 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
FB-DIMM thermal throttling has been disabled.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x16 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
Last FB-DIMM AMB has been disabled.
Resolution: Contact 3PAR technical support.
Fatal error: Code 50, subcode 0x17 (0)
SDRAM_INIT_WARNING "Control Cache Init Failure"
The FB-DIMM memory branches do not match in size.
Resolution: Contact 3PAR technical support.
Fatal error: Code 51, subcode 0x1 (Data)
CMA_BIST_FAILURE "CM ASIC Cache BIST Failure"
The BIST (Built-in Self Test) in Harrier reported either a
BAD
value or a different value from what was recorded in the
node PROM
during MFG board assembly. (Data = Harrier BIST result)
Resolution: Replace the node. Note for OPS that Harrier
BIST
failed, and that the PROM should not be wiped.
Non-fatal error: Code 51,
sub-code 0x2 (Data)
CMA_BIST_FAILURE "CM ASIC BIST Failure"
During Harrier initialization, the CMA BIST test failed but
due to
some other (e.g. I2C I/O error) reason. This error codes
indicates
that the BIST test itself hasn't failed but there was an
error
which occured either during book-keeping (PROM0 read/write)
or the
test was not performed at all because it failed to read a
Harrier
register. (Data = 0x2f)
Resolution: Monitor and replace the node if the issue
recurs.
If the node is replaced, note for OPS that they should
verify I2C to the node PROM is functional.
Table Continued
330
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 52,
sub-code 0x0
Description
CPU_PM_FAILURE"CPU Power Management Failure"
One or more bits in CPU's 2 General Power Management
registers
were set due to abnormal power reset. The set bits are
printed
describing the cause.
Resolution: Contact engineering with data.
Non-fatal error: Code 52,
sub-code 0x1
CPU_PM_FAILURE"CPU TSC is over 48-bits"
CPU Timestamp counter is too big after reset. There could
be CPU
reset issue.
Resolution: Contact engineering with data.
Fatal error: Code 53, subcode 0x0 (xxxx)
FPGA_FAILURE "FPGA Failure"
The CPU was unable to communicate with the FPGA.
Resolution: Replace node motherboard.
Fatal error: Code 53, subcode 0x1
FPGA_FAILURE "FPGA Failure"
FPGA revision in EOS node is old. FPGA upgrade is required.
Resolution: Upgrade FPGA to the latest revision.
Fatal error: Code 54, subcode 0x0 (xxyy)
VRM_FAILURE "VRM Failure"
A CPU VRM is missing.
or
A CPU VRM is not providing power.
Resolution: A) Replace CPU VRM yy.
B) Replace node motherboard.
Fatal error: Code 55, subcode 0xzzzzzzzz (yyy)
UEFI_PEI_FAILURE "UEFI Failure: PEI"
UEFI failed to boot, failed during PEI due to assert.
Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3
tree
to determine filename of assert. yyy specifies line number
(in hex).
Resolution: Contact 3PAR technical support.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
331
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 56, subcode 0xaabb (yyy)
Description
UEFI_MRC_FAILURE "UEFI Failure: Memory Training"
UEFI failed to boot, failed during Intel MRC memory
training code.
aa specifies the major code, bb specifies the minor code
* Major Code Table
0x30Correctable error during MRC memory training
0x31Uncorrectable error during MRC memory training
0xE8ERR_NO_MEMORY
0xE9ERR_LT_LOCK
0xEAERR_DDR_INIT
0xEBERR_MEM_TEST
0xECERR_VENDOR_SPECIFIC
0xEDERR_DIMM_COMPAT
0XEEERR_MRC_COMPATIBILITY
0xEFERR_MRC_STRUCT
Resolution: Contact 3PAR technical support.
Fatal error: Code 57, subcode 0xzzzzzzzz (yyy)
UEFI_DXE_FAILURE "UEFI Failure: DXE"
UEFI failed to boot, failed during DXE due to assert.
Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3
tree
to determine filename of assert. yyy specifies line number
(in hex).
Resolution: Contact 3PAR technical support.
Non-fatal error: Code 58,
sub-code 0x0
HECI_FAILURE "HECI Interface Failure"
CBIOS failed to obtain the ME firmware flash unlock code
through the HECI interface. This could prevent flash
commands
from functioning.
Resolution: Try rebooting the node.
Fatal error: Code 59, subcode 0x00 (0)
FAILSAFE_BIOS_BOOT"Failsafe Boot Halt"
The EOS Failsafe BIOS has booted without detecting a CRC
error in
the Main BIOS indicating a HW initialization failure
preventing the
node from booting.The Failsafe BIOS has also detected five
or
more non-CRC failures causing boots to failsafe within the
past
two hours and has stopped attempting to recover
automatically.
Table Continued
332
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 59,
sub-code 0x01 (bbxxyyzz)
Description
FAILSAFE_BIOS_BOOT"Failsafe Boot Mode"
The EOS Failsafe BIOS has booted. This non-fatal entry is
logged
to mark the switch over from the Main BIOS to the Failsafe
BIOS
which may be a different version.
The data field contains the build (bb) and version
(xx.yy.zz) of
the Failsafe BIOS that is booting.
Non-fatal error: Code 59,
sub-code 0x02 (flags)
FAILSAFE_BIOS_BOOT"Failsafe Boot Mode"
The EOS Failsafe BIOS has booted. This non-fatal entry is
logged
to mark the switch over from the Main BIOS to the Failsafe
BIOS
which may be a different version.
The data field contains FPGA flags at the time of the boot.
Bits 0..7- FSBC_STAT register from the FPGA.
See FPGA design documentation for details.
Bits 8..15- FPGA Revision register.
Bits 16..23 - FPGA ID register. (=4 for EOS)
Bit 24- Flag indicating state of env var qa_force_bios_to
Bit 25- Flag indicating state of env var qa_force_fs_to
Flag: 1=var is set, 0=var is not set.
Bits 26..31 - Reserved, =0
Non-fatal error: Code 60,
sub-code 0x00 (0)
Fatal error: Code 60, subcode 0x01 (0)
NEMOE_FAILURE "Nemoe Failure"
The OKI Nemoe MCU has failed to boot within the specified
timeout
and the UEFI BIOS has reset the chip. This non-fatal entry
is
logged to record the boot failure and attempted restart by
BIOS.
No recovery action is required for this subcode.
NEMOE_FAILURE "Nemoe Failure"
The OKI Nemoe MCU has failed to boot within the specified
timeout,
The BIOS had eset the part, and it still faied to complete
its
boot initialization before a timeout. The only corrective
action
is to replace the node.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
333
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 61,
sub-code 0x00 (data)
Description
AC_POWER_LOSS "AC Power Loss"
Turning off BBU because the node is on battery power.
This will shut down the node until AC is restored.
This message indicates that all power supplies lost input
AC Power
and that the BIOS powered down the node to avoid draining
the
battery.
The data value provides a mask of power supplies which have
AC good
input but failed DC output.
Resolution: A) Apply AC power to the node.
B) Replace the power supplies.
Fatal error: Code 62, subcode 0x00 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x00 indicates a general timeout or exhaustion of
available
retries during overall Write/Read/Gate leveling.
The "path" value encodes the cma number, channel number, and
chip-select map, according to the following bit range
mapping:
|31
24|2316|15 8|7
0|
| cma number | channel number |
| chip select map|
Resolution: A) Cycle power on the node.
B) Reseat CM memory riser card.
C) Reseat the failing Cluster memory DIMM.
D) Replace the failing Cluster memory DIMM.
E) Replace the node motherboard.
Fatal error: Code 62, subcode 0x01 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x01 indicates a timeout during write leveling, or
an
exhaustion of available retries during write leveling.
The "path" value encodes the cma number, channel number, and
chip-select number, according to the following bit range
mapping:
|31
24|2316|15 8|7
0|
| cma number | channel number |
|
chip select|
See Code 62, sub-code 0x00 for resolution information.
Table Continued
334
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 62, subcode 0x02 (path)
Description
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x02 indicates a timeout during read leveling, or
an
exhaustion of available retries during read leveling.
The "path" value encodes the cma number, channel number, and
chip-select number, according to the following bit range
mapping:
|31
24|2316|15 8|7
0|
| cma number | channel number |
|
chip select|
See Code 62, sub-code 0x00 for resolution information.
Fatal error: Code 62, subcode 0x03 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x03 indicates a timeout writing to a Mosys PHY
register.
The "path" value encodes the channel number and the PHY CSR
address,
according to the following bit range mapping:
|3116|15 0|
| channel number
| CSR address
|
See Code 62, sub-code 0x00 for resolution information
Fatal error: Code 62, subcode 0x04 (path)
CM_DDR3_LEVEL_FAILURE "Level Failure"
This error code indicates a failure during Cluster Memory
leveling.
Sub-code 0x04 indicates a timeout reading from a Mosys PHY
register.
The "path" value encodes the channel number and the PHY CSR
address,
according to the following bit range mapping:
|3116|15 0|
| channel number
| CSR address
|
See Code 62, sub-code 0x00 for resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
335
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 63, subcode 0x00 (xxyy)
Description
LRDIMM_COMM_FAILURE "LRDIMM Communication failure"
xx = SMBUS address
yy = SMBUS read failure status
This error indicates a failure during a SMBUS read of an
LRDIMM iMB register.
Resolution: A) Use the Whack command line to re-run the CM
initialzation test (cma init).
B) Use the Whack command line to Reset node.
C) Cycle power on the node.
D) Reseat appropriate Cluster Memory DIMM.
E) Replace appropriate Cluster Memory DIMM.
F) Replace the node motherboard.
Fatal error: Code 63, subcode 0x01 (xxyy)
LRDIMM_COMM_FAILURE "LRDIMM Communication failure"
xx = SMBUS address
yy = SMBUS write failure status
This error indicates a failure during a SMBUS write to an
LRDIMM iMB register.
Resolution: See Code 63 sub-code 0x00.
Fatal error: Code 63, subcode 0x02 (xxxxyyzz)
LRDIMM_COMM_FAILURE "LRDIMM iMB data mis-compare"
xxxx = iMB register
yy = Expected contents of register
zz = Actual contents of register
This is a Data MisCompare while verifying LRDIMM iMB
register initial values.
Resolution: See Code 63 sub-code 0x00.
Fatal error: Code 64, subcode 0x00 (FFFF)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR"
Could not find the variable string in the environment
variable table.
Resolution: A) User needs to fix illegal name in the table
or in call to table.
Fatal error: Code 64, subcode 0x00 (PortNum)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR"
PortNum specifies the invalid port number.
The user entered an incorrect RPC or LPC port number in the
"CMA PPHY..." command.
Resolution: A) User needs to enter the correct port number
for the "CMA PPHY..." command.
Table Continued
336
Error codes—HPE 3PAR OS 3.2.2
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Non-fatal error: Code 64,
sub-code 0x01 (ack)
Description
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_ACK_OUT_TIMEOUT"
ack is the current state of the PPHY control register
acknowledge bit.
If ack is 0, the timeout occurred waiting for the ack bit
to assert. If ack is 1, the
timeout occurred waiting for the ack bit to deassert.
Resolution:
A) The "CMA PPHY..." commands are series of commands that
allow the user to
modify settings on PPHYs or run various tests.
Therefore, the user should
know what has changed and know whether or not any
failures are real or a
result of the changes that were made. If you feel the
hardware is bad,
continue,
B) Use the Whack command line to Reset node.
C) Cycle power on the node.
D) Replace the node motherboard.
Non-fatal error: Code 64,
sub-code 0x02 (data)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_MISMATCH"
data is the actual data value from the PPHY register that
did not match the expected value.
Resolution: A)See Code 64, sub-code 0x01 resolution.
Fatal error: Code 64, subcode 0x03 (PortNum)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_LBERT_ERROR"
PortNum is the RPC port being tested.
BERT is used to generate a pattern for the voltage margin
test and the test is expected to
generate a BERT error. This failure indicates the expected
error occurred but did not clear
or the expected error did not occur.
Non-fatal error: Code 64,
sub-code 0x04 (PortNum)
PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_PHASE_ERROR Total Eye
Margin value is over 1 UI"
PortNum is the RPC or LPC port being tested. UI is Unit
Interval.
Resolution: A) See Code 64, sub-code 0x01 resolution.
Non-fatal error: Code 65,
sub-code 0x00 (xxyy)
BOOT_DISK_WARNING "Boot disk warnings"
Booting with the default boot disk (xx) failed and the
default boot disk was
changed to next available boot disk (yy).
Resolution: Check boot disk (xx) for any disk failure.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
337
BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Non-fatal error: Code 65,
sub-code 0x01 (xxyy)
BOOT_DISK_WARNING "Boot disk warnings"
Booting with the default boot disk (xx) failed and next
available
boot disk (yy) has failed booting.
Resolution: Check boot disk (xx) and (yy) for any disk
failure
Non-fatal error: Code 65,
sub-code 0x02 (0)
BOOT_DISK_WARNING "Boot disk warnings"
Reading boot disk info from PROM failed and dual boot disk
configuration was
skipped.
Resolution: Check PROM for any access failure.
Status: Code 127
(STAT_BIOS_DIAG) "BIOS
Diag"
Status: Code 128
(STAT_BIOS_UPDATE)
"BIOS Update"
This code is not an error.It is a BIOS diagnostic failure
which
was forced by the Whack "fatal" command. It is used to
test the
error logging and reporting mechanisms of the BIOS and TPD
software.
This code is not an error.It indicates the BIOS determined
that it had been updated. During CBIOS initialization, it
looks at a value stored in NVRAM to determine if the current
version is newer than the version previously booted. If so,
the BIOS logs this update.The sub-code is the new BIOS
version and the minor code is the old BIOS version. Example:
Code 128 (BIOS update) - Subcode 0x10204 (10201)
The above indicates CBIOS was updated from version 1.2.1 to
1.2.4.
HPE 3PAR OS fatal error codes and error resolution—HPE
3PAR OS 3.2.2
Error codes above 255 are in the domain of the OS.
338
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Fatal error: Code 257,
sub-code yyyyyyyy (xx)
Description
PROM_EA_MEM_UERR
"Uncorrectable Cluster memory"
S-Series (PIII and P4) and E-Series (P4) nodes:
Log:
Uncorrectable Error in Cluster Memory
Text: UERR wwwwwwww at cluster DIMM xx, addr yyyyyyyy,
syn zzzzzzzz
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: eagle_err_interrupt() of eagleint.c
This error indicates the Cluster Manager ASIC (Eagle)
has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yyyyyyyy). The node is taken
out of the
cluster in response to this error.
T-Series, F-Series, and V-Series (5000P) nodes:
Log:
Uncorrectable Error in Cluster Memory
Text: CM UECC Error Status [wwwwwwww]:
osp: UECC: address=yy:yyyyyyyy chnl 0xww seg 0xqq synd
0xrr
bank=0xss col=0xtttt row=0xuuuu DIMMww.vv Multibit
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: eagle_err_interrupt() of eagleint.c
This error indicates the Cluster Manager ASIC (Osprey)
has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yy:yyyyyyyy). The node is taken
out of the
cluster in response to this error. If only the xx
value is
available, the DIMM number may be computed as (xx % 3).
(xx / 3).
For example: if xx is 2, this would refer to DIMM2.0.
Series based on the Harrier ASIC:
Log:
Uncorrectable Error in Cluster Memory
Text: HAR0|1 MemCore0|1 MUERR|UERR IntStatus=wwwwwwww
data xxxxxxxx:xxxxxxxx
denali channel addr y:yyyyyyyyy syndrome z
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: harrier_err_interrupt() of harrierint.c
This error indicates the Cluster Manager ASIC (Harrier)
has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yyyyyyyy). The node is taken
out of the
cluster in response to this error.
DIMM callout (xx) = 0 = DIMM0.0.0
Table Continued
Error codes—HPE 3PAR OS 3.2.2
339
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
1
2
3
4
5
6
7
=
=
=
=
=
=
=
DIMM0.1.0
DIMM0.0.1
DIMM0.1.1
DIMM1.0.0
DIMM1.1.0
DIMM1.0.1
DIMM1.1.1
Series based on the Harrier2 ASIC:
Log:
Uncorrectable Error in Cluster Memory
Text: HAR2 0|1 MemCore0|1 MUERR|UERR
IntStatus=wwwwwwww data xxxxxxxx:xxxxxxxx
DDR3 addr yy:yyyyyyyy syndrome z
Event: Uncorrectable Memory Error
Panic: Panic due to Uncorrectable Memory Error
Where: harrier2_err_interrupt() of harrier2int.c
This error indicates the Cluster Manager ASIC
(Harrier2) has detected
an uncorrectable memory error in one or more cluster
memory
DIMMs (xx) at address (yyyyyyyy). The node is taken
out of the
cluster in response to this error.
See DIMM callout for Harrier ASIC.
This event is usually followed by a core dump on disk.
The kernel log text in the core dump usually contains
some easy
to interpret text which identifies which DIMM has
failed.
Resolution: A) Cycle power on the node.
B) Reseat Cluster Memory riser card.
C) Reseat the failing Cluster Memory DIMM(s).
D) Replace the failing Cluster Memory DIMM(s).
E) Replace the node motherboard.
Diagnostic: A) Ensure BIOS tests are enabled using the
"table skip none" command at a Whack prompt.
B) Use "mem test cm" command to test cluster memory.
C) wwwwwwww is the CM Error interrupt status register
and the syndrome is zzzzzzzz. These may be decoded
using scaffold documentation.
Fatal error: Code 258,
sub-code xx (yy)
PROM_EA_MEM_CERR
"Correctable Cluster memory"
This error is not currently generated by a node. It is
a placeholder
should it be necessary to record correctable cluster
memory ECC errors
in the node PROM.
Table Continued
340
Error codes—HPE 3PAR OS 3.2.2
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 259,
sub-code xx (yy)
PROM_EA_XCB_ERR
"Error in the XCB engine"
This error is not currently generated by a node.
a placeholder
for CM XCB engine hardware errors.
Fatal error: Code 260,
sub-code xx (yy)
It is
PROM_EA_MEM_MUERR "Multiple uncorrectable memory"
Log:
Multiple Uncorrectable Error in Cluster Memory
Text: MUERR wwwwwwww at cluster DIMM xx, addr
yyyyyyyy, syn zzzzzzzz
Event: Multiple Uncorrectable Memory Error
Panic: Panic due to Multiple Uncorrectable Memory Error
Where: eagle_err_interrupt() of eagleint.c
This error indicates the Cluster Manager ASIC (Eagle or
Osprey) has
detected multiple uncorrectable memory errors in
cluster memory
DIMM (xx) at address (yyyyyyyy). The node is taken out
of the
cluster in response to this error.
See Code 257 for error resolution information.
Table Continued
Error codes—HPE 3PAR OS 3.2.2
341
HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2
Code
Description
Fatal error: Code 261,
sub-code xx (yy)
PROM_EA_HW_ERR
"Cluster Manager HW error"
This error is not currently generated by a node. It is
a placeholder
for Cluster Manager (Eagle or Osprey) internal hardware
errors.
Fatal error: Code 262,
sub-code xxxxxxxx (yy)
PROM_EA_PCI_ERR
"Cluster Manager PCI error"
Log:
Cluster Manager PCI Error
Text: ea_pci_err: bus yy, status xxxxxxxx
Call CBIOS to analyze error.
...
Event: PCI bus yy error xxxxxxxx
Panic: Panic due to Eagle PCI error: bus yy, status
xxxxxxxx
Where: ea_pci_err() of eaint_hdler.c
This error indicates the Cluster Manager ASIC (Eagle or
Osprey) has
detected a PCI bus error while communicating with
either a CPU or
one of the PCI slot devices. This error is most likely
caused by a
card which has failed in one of the PCI slots. You may
need to
observe BIOS output which would be recorded in the
crash dump
in order to determine the true cause.
Resolution: A) Cycle power on the node.
B) Read BIOS output to determine if a specific PCI
card is implicated by the slot bridges. If so,
replace the card.
C) Replace the node motherboard.
Diagnostic: A) If BIOS messages indicate no other
device is at
fault, then manual BIOS tests may be performed to
determine if the cause is CIOB. You may use the
Whack "mem test" command with a CM memory range
to generate accesses. Use "eagle status" and
"eagle clear" to get and clear errors.
B) The "fibre test cluster" command is good to test
access from a fibre channel card to the CM.
342
Error codes—HPE 3PAR OS 3.2.2
Websites
General websites
Hewlett Packard Enterprise Information Library
www.hpe.com/info/EIL
Single Point of Connectivity Knowledge (SPOCK) Storage compatibility matrix
www.hpe.com/storage/spock
Storage white papers and analyst reports
www.hpe.com/storage/whitepapers
For additional websites, see Support and other resources.
Websites
343
Support and other resources
Accessing Hewlett Packard Enterprise Support
•
For live assistance, go to the Contact Hewlett Packard Enterprise Worldwide website:
•
http://www.hpe.com/assistance
To access documentation and support services, go to the Hewlett Packard Enterprise Support Center
website:
http://www.hpe.com/support/hpesc
Information to collect
•
•
•
•
•
•
•
•
Technical support registration number (if applicable)
Product name, model or version, and serial number
Operating system name and version
Firmware version
Error messages
Product-specific reports and logs
Add-on products or components
Third-party products or components
Accessing updates
•
•
Some software products provide a mechanism for accessing software updates through the product
interface. Review your product documentation to identify the recommended software update method.
To download product updates:
Hewlett Packard Enterprise Support Center
•
•
www.hpe.com/support/hpesc
Hewlett Packard Enterprise Support Center: Software downloads
www.hpe.com/support/downloads
Software Depot
www.hpe.com/support/softwaredepot
To subscribe to eNewsletters and alerts:
www.hpe.com/support/e-updates
To view and update your entitlements, and to link your contracts and warranties with your profile, go to
the Hewlett Packard Enterprise Support Center More Information on Access to Support Materials
page:
www.hpe.com/support/AccessToSupportMaterials
IMPORTANT:
Access to some updates might require product entitlement when accessed through the Hewlett
Packard Enterprise Support Center. You must have an HPE Passport set up with relevant
entitlements.
344
Support and other resources
Customer self repair
Hewlett Packard Enterprise customer self repair (CSR) programs allow you to repair your product. If a
CSR part needs to be replaced, it will be shipped directly to you so that you can install it at your
convenience. Some parts do not qualify for CSR. Your Hewlett Packard Enterprise authorized service
provider will determine whether a repair can be accomplished by CSR.
For more information about CSR, contact your local service provider or go to the CSR website:
http://www.hpe.com/support/selfrepair
Remote support
Remote support is available with supported devices as part of your warranty or contractual support
agreement. It provides intelligent event diagnosis, and automatic, secure submission of hardware event
notifications to Hewlett Packard Enterprise, which will initiate a fast and accurate resolution based on your
product's service level. Hewlett Packard Enterprise strongly recommends that you register your device for
remote support.
If your product includes additional remote support details, use search to locate that information.
Remote support and Proactive Care information
HPE Get Connected
www.hpe.com/services/getconnected
HPE Proactive Care services
www.hpe.com/services/proactivecare
HPE Proactive Care service: Supported products list
www.hpe.com/services/proactivecaresupportedproducts
HPE Proactive Care advanced service: Supported products list
www.hpe.com/services/proactivecareadvancedsupportedproducts
Proactive Care customer information
Proactive Care central
www.hpe.com/services/proactivecarecentral
Proactive Care service activation
www.hpe.com/services/proactivecarecentralgetstarted
Warranty information
To view the warranty for your product or to view the Safety and Compliance Information for Server,
Storage, Power, Networking, and Rack Products reference document, go to the Enterprise Safety and
Compliance website:
www.hpe.com/support/Safety-Compliance-EnterpriseProducts
Additional warranty information
HPE ProLiant and x86 Servers and Options
www.hpe.com/support/ProLiantServers-Warranties
HPE Enterprise Servers
www.hpe.com/support/EnterpriseServers-Warranties
HPE Storage Products
www.hpe.com/support/Storage-Warranties
Customer self repair
345
HPE Networking Products
www.hpe.com/support/Networking-Warranties
Regulatory information
To view the regulatory information for your product, view the Safety and Compliance Information for
Server, Storage, Power, Networking, and Rack Products, available at the Hewlett Packard Enterprise
Support Center:
www.hpe.com/support/Safety-Compliance-EnterpriseProducts
Additional regulatory information
Hewlett Packard Enterprise is committed to providing our customers with information about the chemical
substances in our products as needed to comply with legal requirements such as REACH (Regulation EC
No 1907/2006 of the European Parliament and the Council). A chemical information report for this product
can be found at:
www.hpe.com/info/reach
For Hewlett Packard Enterprise product environmental and safety information and compliance data,
including RoHS and REACH, see:
www.hpe.com/info/ecodata
For Hewlett Packard Enterprise environmental information, including company programs, product
recycling, and energy efficiency, see:
www.hpe.com/info/environment
Documentation feedback
Hewlett Packard Enterprise is committed to providing documentation that meets your needs. To help us
improve the documentation, send any errors, suggestions, or comments to Documentation Feedback
(docsfeedback@hpe.com). When submitting your feedback, include the document title, part number,
edition, and publication date located on the front cover of the document. For online help content, include
the product name, product version, help edition, and publication date located on the legal notices page.
346
Regulatory information
Download