HPE 3PAR BIOS Error Codes Reference Guide Abstract This Hewlett Packard Enterprise (HPE) guide provides authorized service technicians information about the HPE 3PAR BIOS error codes. The error code information is provided in separate chapters for either HPE 3PAR OS 3.3.1 or HPE 3PAR OS 3.2.2. This document is for Hewlett Packard Enterprise INTERNAL USE ONLY. Part Number: QL226-99685 Published: August 2017 © Copyright 2017 Hewlett Packard Enterprise Development LP Notices The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein. Confidential computer software. Valid license from Hewlett Packard Enterprise required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Links to third-party websites take you outside the Hewlett Packard Enterprise website. Hewlett Packard Enterprise has no control over and is not responsible for information outside the Hewlett Packard Enterprise website. Acknowledgments Intel®, Itanium®, Pentium®, Intel Inside®, and the Intel Inside logo are trademarks of Intel Corporation in the United States and other countries. Microsoft® and Windows® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Adobe® and Acrobat® are trademarks of Adobe Systems Incorporated. Java® and Oracle® are registered trademarks of Oracle and/or its affiliates. UNIX® is a registered trademark of The Open Group. Contents Error codes—HPE 3PAR OS 3.3.1......................................................... 4 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1.............................................. 4 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1...........................173 Error codes—HPE 3PAR OS 3.2.2..................................................... 178 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2.......................................... 178 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2...........................338 Websites.............................................................................................. 343 Support and other resources.............................................................344 Accessing Hewlett Packard Enterprise Support....................................................................... 344 Accessing updates....................................................................................................................344 Customer self repair..................................................................................................................345 Remote support........................................................................................................................ 345 Warranty information.................................................................................................................345 Regulatory information..............................................................................................................346 Documentation feedback.......................................................................................................... 346 Contents 3 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 When a BIOS initialization or diagnostic test fails such that the node cannot be allowed to continue booting, a fatal error message is often displayed (sometimes with additional information). For each class of error, a major Code is provided. A class-specific sub-code is also provided which gives the specific failure condition. For example: PROM checksum: FAIL *** Fatal error: Code 25, sub-code 0x0 (0) The fatal error shown represents a PROM checksum failure. The Serial EEPROM on the board has a bad checksum. Either it has not been initialized or it is corrupted. When the BIOS reaches a Fatal error, it immediately stops all hardware initialization, testing, and booting, and then logs the error to the Serial EEPROM (PROM). Whack (the CBIOS command line) is available by pressing ^W (Control and W keys simultaneously), which you can use to diagnose and possibly correct the problem. You can use the Whack command prom log to review previously recorded fatal system errors. If you believe the Fatal error does not impact your immediate test and would like to try to resume, press ^C (not recommended). The fatal error routine returns to the point where the error was detected, possibly drawing more fatal errors or worse depending on the type of error. Instead, To ensure safe system operation, Hewlett Packard Enterprise recommends resolving the problem before resuming. NOTE: A "GEvent" or "GPE" is a "GPIO (General Purpose I/O) event" or "general purpose event". 4 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 0, subcode 0x0 (0) Description INITIALIZATION_OK"No Error" This is actually not a node hardware or software initialization or test failure. This code should never occur, and suggests corruption of the PROM log if it is seen. Resolution: Contact 3PAR technical support. Fatal error: Code 1, subcode 0x1 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Unknown CPUID string: `xxxx' Bad or unknown CPU ID (non-Intel).The BIOS is unable to fully identify the processor. This sub-code indicates the CPUID string is not "GenuineIntel". Resolution: A) Replace the processor. B) Try moving the processor to the other CPU socket. It could be a single socket problem. C) Try moving the processor to another system. It could be node hardware or software. D) Replace the node motherboard. Diagnostic: A) Use Whack "cpu id" command. The interesting line will follow a line similar to: Intel Pentium III Processor: or Intel Pentium 4 Xeon Processor: Fatal error: Code 1, subcode 0x2 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Required features 0x008053fb are missing. Each class of CPU has a list of technology features it supports. If this error occurs, it is because the CPU is either severely downrev, the CPU is bad, or the motherboard is bad. Resolution: A) Replace the processor. B) Try moving the processor to the other CPU socket. It could be a single socket problem. C) Try moving the processor to another system. It could be node hardware or software. Diagnostic: A) Use Whack "cpu id" command. The interesting line will be similar to: Family 6 ... Features 0x0387fbff, Pflags 4 Table Continued Error codes—HPE 3PAR OS 3.3.1 5 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 1, subcode 0x3 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: 3PAR has not certified this CPU. Each run of CPU has a major revision and a minor stepping number. If you receive this message, the processor has not yet been verified by 3PAR for reliable operation. If this is a new processor, it may be acceptable to press ^C to resume after this error. If you are testing a new stepping of the processor and need to use it, use the following Whack command to ignore an unknown CPUID: Whack> set perm cpu_unqual_ok Resolution: A) Upgrade to the latest CBIOS to ensure newer certified processors are acceptable. B) Replace the processor with one certified by 3PAR for use with the board. Diagnostic: A) Use Whack "cpu id" command. The interesting line will be similar to: Family 6, Model 8, Stepping 3, Features ... Fatal error: Code 1, subcode 0x4 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: 3PAR has not certified the bootstrap CPU as a dual processor. If more than one processor is installed, both CPUs must be certified to operate in multiprocessor mode. This error indicates that the bootstrap processor was found to not be certified to run in a multiprocessor mode. See Code 1, sub-code 0x3 for resolution information. Fatal error: Code 1, subcode 0x5 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: 3PAR has not certified this CPU as a multiple processor. See Code 1, sub-code 0x3 for resolution information. Table Continued 6 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 1, subcode 0x6 (0) Description BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Microcode table size is xxxx, which is not 4 mod 2048. This is an internal CBIOS consistency check error. If you see this error, most likely processor execution out of flash is not stable. The CPU identification is performed after the flash is fully CRC verified, so this error is likely the result of a failing CPU or transient bus operation. Resolution: A) Replace the processor. B) Re-flash the CBIOS (no need to upgrade). Diagnostic: A) Use Arium and scope to watch processor fetches from flash trigger no unusual bus operations. Fatal error: Code 1, subcode 0x7 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Invalid microcode checksum: xxxx This is another internal CBIOS consistency check error. Before each block of update microcode is uploaded to the Pentium, a checksum on it is first verified. If this checksum is not valid, the block will be rejected with this error. See Code 1, sub-code 0x3 for resolution information. Fatal error: Code 1, subcode 0x8 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Microcode update failed: expected xxxx, got yyyy The processor has rejected the microcode update. This could be any number of things, but is likely due to a failing processor. At this point a strong 64-bit CRC has been run successfully across the BIOS and a checksum for each update line has also passed. See Code 1, sub-code 0x4 for resolution information. Fatal error: Code 1, subcode 0x9 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: No microcode update found for this CPU. The BIOS was not able to locate a microcode update for this particular processor, yet it is listed as a CPU which requires a microcode update. This is likely due to use of an unqualified processor. See Code 1, sub-code 0x4 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 7 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 1, subcode 0xa (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: CPU failed BIST (built in self test): xxxxxxxx The processor has failed its own built in self test. This indicates strongly that the processor is at fault. Resolution: A) Replace the processor. B) Replace both processor VRM modules. C) Replace the node motherboard. Fatal error: Code 1, subcode 0xb (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: First CPU's bus ratios were wwww:xxxx but this CPU's bus ratios are yyyy:zzzz. The two processors in the system board do not have the same bus clock multiplier. The likely cause is that the processors are of different clock speeds (or less likely minor steppings). The "First CPU" as written above is the bootstrap CPU. On a PIII board, the bootstrap CPU (CPU3) is to the right, nearest the PromJet interface. Resolution: A) Remove both heatsinks and verify the processors are rated for the same clock speed and bus multiplier. B) Replace each processor individually. C) Replace the node motherboard. Diagnostic: A) Use Whack "cpu id" command. If you enter Whack before Linux is booted, you will consistently run on the bootstrap CPU. If you enter Whack from Linux (using the whack command), it is a race as to on which CPU you will enter Whack. The SMI output indicates on which CPU whack is running. Using this method, or using the "cpu switch" command, you can "cpu id" all processors in the node. Example: Whack> cpu id Intel(r) Pentium(r) III Processor: Family 6, Model 8, Stepping 3, ... ... CPUID[3] == 0x00000000 0x00000000 0xda28203c ... ... Bus to CPU ratio == 2/13 ... Clock Frequency Ratio == 7 Table Continued 8 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 1, subcode 0xc (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" This CPU does not support clock multiplier changes In the supported configuration, the two CPUs present in the node must run at the same clock speed. If the BIOS detects CPUs which have different clock multipliers, it will automatically configure all CPUs to use the highest common clock multiplier. If a CPU's multiplier cannot be changed, then this fatal error will result. See Code 1, sub-code 0x4 for resolution information. Fatal error: Code 1, subcode 0xd (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" Desired clock multiplier xx is too high for this CPU This error indicates the CPU does not support a clock multiplier the BIOS is attempting to set. See Code 1, sub-code 0xc for resolution information. Fatal error: Code 1, subcode 0xe (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" Desired clock multiplier xx is illegal for this CPU See Code 1, sub-code 0xd for information on this error. Table Continued Error codes—HPE 3PAR OS 3.3.1 9 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 2, sub-code 0x1 (0) RTC_FAILURE "RTC Failure" *** Error: Real-Time Clock not initialized. The Real-Time Clock (RTC) is a function of the SuperIO which provides a battery backed system clock and a small quantity of battery backed Non-Volatile RAM for system configuration flags. This error indicates the RTC memory has become corrupt, possibly due to a dead battery or battery removal when no mainline power was available. Resolution: A) Power down, wait 30 seconds, power up. This error should self-correct (likely with a loss of current date/time and other NVRAM contents). Set the date and time using the Whack "rtc date" command. B) Replace the RTC battery, located near the SuperIO ASIC. C) Use the Whack command "rtc date" to set the RTC date and time. D) Replace the node motherboard. Diagnostic: A) Use Whack "time loop" command. The left column is RTC seconds and should increment exactly at second intervals. The right column is a time scaled processor performance counter and should (even in the case of a deviant slow or fast RTC) still increment nearly in lock step with the RTC. B) Verify there is not a dead short across the RTC battery. This will drain the battery and immediately invalidate the Non-fatal error: Code 2, sub-code 0x2 (0) RTC_FAILURE "RTC Failure" RTC_BATTERY_LOW RTC / NVRAM Battery Failure - Replace battery. The RTC / NVRAM battery was found to have a low voltage by the built-in monitoring circuit of the Real Time Clock (RTC). The RTC battery provides power to the RTC clock function of the SuperIO while the board is not drawing mainline supply power. Over time, this battery's available power will decay (rated for over five years normal operation). Resolution: A) Replace the RTC lithium cell battery on the node motherboard. B) Replace the node motherboard. Diagnostic: A) Verify the lithium cell has a 3V charge. B) Verify there is not a dead short across the RTC battery. This will rapidly drain the battery and immediately invalidate the RTC contents on power down. Table Continued 10 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 2, sub-code 0x3 (0) Description RTC_FAILURE "RTC Failure" RTC_INVALID_TIME The current RTC date/time is invalid. Enter the correct date/time or press Tab to acquire it from the network. If the time has not yet been set, or becomes invalid due to loss of battery power, this BIOS will report this error and wait for the user to update the time. Resolution: A) Enter the correct time. B) Press TAB to acquire the time from the network. C) Press ^C to abort prompt and resume boot. Non-fatal error: Code 2, sub-code 0x4 (0) RTC_FAILURE "RTC Failure" RTC_BATTERY_LOW RTC / NVRAM Battery Failure - Replace battery. The RTC / NVRAM battery was found to have a low voltage by the built-in monitoring circuit of the RTC (TOD clock). Resolution: A) Replace the lithium-ion cell battery on the node. B) Replace the node motherboard. Non-fatal error: Code 2, sub-code 0x5 (mode) RTC_FAILURE "RTC Failure" RTC was found in Binary mode or 12 Hour mode. The RTC has two modes of operation. The BIOS prefers the RTC to be in BCD mode rather than Binary mode. If the RTC is in Binary mode, then this must have been set by the OS. The BIOS will reset the RTC to BCD mode. Also, if the RTC is in 12 Hour mode, then the BIOS will report this and correct the RTC to 24 Hour mode. The mode byte tells us which mode it was in: Bit 1 should be on for 24 Hour mode. Bit 2 should be off for BCD mode. Resolution: A) Informational only. If in Development, then need to alert the development team. Non-fatal error: Code 2, sub-code 0x6 (0) RTC_FAILURE "RTC Failure" RTC update was stopped. Resolution: A) Informational only. then need to alert the development team. If in Development, Table Continued Error codes—HPE 3PAR OS 3.3.1 11 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 3, subcode 0x0 (0) Description SRAM_INIT_FAILURE"CPU SRAM Init Failure" During initialization, memory areas are tested before they are used.SRAM is used by the processor for persistent storage during early initialization and the CPU memory tests. This sub-code indicates that the SRAM walking bits test has failed and that the onboard SRAM may not be reliable. Resolution: A) Power down, wait 30 seconds, power up. This problem is likely not a one time occurrence, so this problem is likely to recur. B) Replace the node motherboard. Diagnostic: A) Use Arium to set and verify SRAM contents. If you notice a pattern, it could be a pulled, stuck, or bridged SRAM line. Fatal error: Code 3, subcode 0x1 (0) SRAM_INIT_FAILURE"CPU SRAM Init Failure" After SRAM contents have been updated with the BIOS static data, a test is performed to ensure the data arrived intact. If it did not, this error is generated. The error could indicate an SRAM failure with the same conditions as above. See Code 3, sub-code 0x0 for resolution information. Table Continued 12 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 4, subcode 0x1 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" Pairvvvv DIMMwwww: (Jxxxx) Bad checksum. Got yyyy, SPD said zzzz *** Error: Bad SDRAM configuration. The SDRAM DIMMs located on the motherboard are used for main CPU memory and are critical to the proper operation of a node. Even before the memory is thoroughly tested for proper operation, it must be configured to appear in CPUaddressable space.Each DIMM has a small embedded serial EEPROM which holds DIMM configuration information such as the number of rows, columns, and banks, as well as memory timing. If this serial EEPROM becomes corrupt, data stored in it regarding the DIMM configuration cannot be trusted. So, this EEPROM also contains a checksum which the BIOS verifies is correct before configuring the DIMM. If this checksum does not match the checksum the BIOS computes across the DIMM, this error will result. The minor code reported is the total count of errors for the DIMM. Resolution: A) Replace the defective CPU DIMM with an identical one. B) If an identical one is not available, replace the CPU DIMM pair. See Code 15 for more resolution information. Diagnostic: A) The CPU DIMMs appear on the I2C bus at 3.a0 through 3.a6. Use the Whack "d i2c" command to display the DIMM serial EEPROM contents to determine if there is a pattern. Example (DIMM 2): Whack> d i2c 3.a4.0 See Code 15 for more resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 13 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 4, subcode 0x2 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" Pairww DIMMxx (yyyy): 'zzzz' read failed *** Error: Bad SDRAM configuration. Where zzzz is one of: row address, column address, module rows, cas latency3, refresh, banks, cas latency2, cas latency1, ras precharge, act_to_rw, act_to_deact, ras cycle, write_to_deact, density, frequency, or DIMM type. This error indicates that a CPU memory DIMM was detected but that the EEPROM present on the DIMM could not be reliably read. The read operation is done through I2C. See Code 4 above for resolution information. Fatal error: Code 4, subcode 0x4 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: 'ssss' in Pairtt DIMMuu (vvvv): ww != DIMMxx (yyyy): zz *** Error: Bad SDRAM configuration. This error indicates the BIOS detected the CPU SDRAM DIMMs in the bank pair are of a different type. Resolution: A) Ensure both DIMMs in the pair are identical. Note that two DIMMs may have the same capacity but have different number of rows, columns, or banks. The DIMM configuration must exactly match. If the DIMMs have the same manufacturer, markings and capacity, they are probably identical. See Code 15 for more resolution information. Diagnostic: A) The EEPROM SPD information in each pair of DIMMs should be nearly identical. See Code 4 above for more diagnostic information. Fatal error: Code 4, subcode 0x8 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: Pairww DIMMxx (yyyy): bad refresh type zz *** Error: Bad SDRAM configuration. This error indicates the value the DIMM reports for refresh is not valid (greater than the maximum refresh counter). See Code 4 above for resolution information. Table Continued 14 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 4, subcode 0x10 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: DIMM Pair wwww: **** Type not known **** (rows xxxx, cols yyyy, banks zzzz) *** Error: Bad SDRAM configuration. This error indicates the values the DIMM reports for rows, columns, and banks do not correspond to any known configuration for a valid DIMM. It is possible the DIMM EEPROM data has become corrupt or that the DIMM is a higher capacity than what is currently supported. See Code 4 above for resolution information. Fatal error: Code 4, subcode 0x20 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: Unable to configure any DQS lines. OR *** Error: Unable to configure DQS lines for nibble x. *** Error: Bad SDRAM configuration. This is P4 only. This error indicates that BIOS failed to find a set of acceptable DQS values for every or one nibble of the DIMMs. See Code 4 above for resolution information. Fatal error: Code 4, subcode 0x100 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: ACT to DEACT of yy.yy clocks is > 6.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. Resolution: A) Replace CPU DIMMs with 3PAR-certified products. B) Replace the node motherboard. C) If there is no other choice, override this error with a BIOS variable, setting "mem_margin" to the percentage outside margin. Example: *** Error: ACT to RW of 3.06 clocks is > 3.00 (2%) *** Error: Bad SDRAM configuration. Fatal error: Code 4, subcode 0x0 (2) Whack> set perm mem_margin=2 Whack> reboot Table Continued Error codes—HPE 3PAR OS 3.3.1 15 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 4, subcode 0x200 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: Act to RW of y.yy clocks is > 3.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 4, subcode 0x400 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: RAS precharge time of y.yy clocks is > 3.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 4, subcode 0x800 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: RAS cycle time of y.yy clocks is > 9.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting that is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 4, subcode 0x1000 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: RAS to RAS of y.yy clocks is > 2.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting that is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Table Continued 16 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 4, subcode 0x2000 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: yyyy: Write to deact > 3. We got zzzz *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting that is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 5, subcode 0x1 (0) C_MAIN1_CALL_FAILURE "c_main1 Call Failure" This exception should never happen unless an earlier exception was ignored by pressing ^C. This is because this exception will only occur if the main initialization, diagnostic test, and boot sequence fails to complete a boot and then the user chooses to ignore the error. A further explanation is necessary. There are two halves to system initialization. The first half relies on only SRAM being available and so stack and runtime variables are stored there. Once main CPU memory has been tested, initialization switches to the second half which relies on the tested SDRAM for all data structures. This second half completes initialization and testing of all other node board devices and executes the boot process. For this last step to fail, the IDE disk must either not be present or contains an invalid boot. At that point a fatal error is generated. Do not ignore this condition. It is a final recourse and an abort will reboot or hang the node board. It is safer at this stage to press ^W and enter Whack.From Whack, you can reboot with the "reboot" command. Resolution: A) Check control cache (CPU) DIMMs are installed and pass initialization. B) Verify the node boot drive is present and node software has been installed. C) Replace the node, including CPU DIMMs and boot drive. Table Continued Error codes—HPE 3PAR OS 3.3.1 17 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 6, subcode 0x1 (0) Description SRAM_BAD "CPU SRAM Bad" *** SRAM failure: address xxxxxxxx wrote yy but read zz This failure indicates an early SRAM verification test revealed a problem with the SRAM. This is an unrecoverable error which likely requires hardware diagnostic. This error is displayed by low level init code.It will never be written to the PROM log because hardware which writes to the PROM relies on correctly functioning SRAM. Resolution: A) Cycle power on the node. B) Replace the bootstrap CPU. C) Replace the node motherboard. Diagnostic: A) Use Arium to set and verify SRAM contents. If you notice a pattern, it could be a pulled, stuck, or bridged SRAM line. Fatal error: Code 7, subcode xxxx (yyyy) SDRAM_BUS_FAST "Control Cache Bus Fast" *** Error: Front side bus speed xxxx > expected yyyy This error indicates the BIOS has detected that the front side bus speed exceeds the expected speed (133 MHz on PIII, 533 MHz on P4, 1333 MHz on 5000P).The system may not perform reliably. Resolution: A) Cycle power on the node. B) Replace the bootstrap CPU. C) Replace the node motherboard. Diagnostic: A) Check the oscillator for the front side bus with a frequency counter or an oscilloscope. Fatal error: Code 8, subcode xxxxxxxx (yyyyyyyy) MACHINE_CHECK_FAILURE"Machine Check Failure" Machine check: MCG_STATUS == xxxxxxxx yyyyyyyy During BIOS initialization and testing, the processor must execute instructions. If this error results at any point, it is likely due to failing hardware related to the CPU's instruction execution path. Resolution: A) Cycle power on the node. B) Update the node firmware to the latest version. C) Replace CPU SDRAM in pairs. D) Replace the node motherboard. Diagnostic: A) Replace CPU VRMs. B) Replace CPUs. C) Use Arium and set a breakpoint on a machine check to determine what errant instructions led up to the machine check. D) This problem may also be a BIOS or booter software bug. Observe the values of the error sub-code and data. They make up the 64-bit value of the MCG_STATUS status register. Table Continued 18 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 9, subcode 0x0 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" *** Entering memory segment test: Stack is in xxx *** One of the first memory tests performed in diagnostic mode is a sequential address or random data test. If there is no memory in the system, or the memory DIMMs are mismatched, or there is a memory subsystem problem, this error may result. Resolution: A) Verify memory is installed and in matched pairs (same manufacturer, exact same memory configuration and speed). B) Replace CPU DIMMs with a set of known good ones. C) Replace the node motherboard. Diagnostic: A) Change memory with Whack "c <addr>" command. Examine memory with Whack "d <addr>" command. B) Use Arium to modify and examine memory. Fatal error: Code 9, subcode 0x1 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Insufficient memory: BSS end == xxxx, stack limit == yyyy During the first part of initialization, system stack comes from SRAM.The second part of initialization, system stack comes from CPU memory.If there is insufficient SDRAM (such as no DIMMs installed) this error may result. It is a bad idea to ignore this error with ^C as the system stack will fall past the available memory and probably hang hard the initialization. See Code 9, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 19 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 9, subcode 0x2 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Expected sdram_init_test to be xxxx, but it was yyyy. After SDRAM has been initialized and scrubbed, the BIOS copies runtime variables from Flash to CPU memory. The fact this data is copied to SDRAM is later verified. This fatal error may be caused by either a software error in the BIOS, a hardware error (such as flaky CPU memory), or user intervention such as modifying the memory containing the SDRAM copy of the runtime variables. Resolution: A) Reboot.If the problem is caused by flaky hardware, a prior memory test should catch this condition. B) Upgrade BIOS version. Not a likely solution since this code path is well tested every time the system is booted. C) Replace CPU DIMMs with a set of known good ones. D) Replace the node motherboard. Diagnostic: A) Examine the BIOS memory area using the Whack "d <addr>" memory dump command.SDRAM data appears in CPU memory in the 0x000d0000 region.The key value is 0xdeadbeef. Example: Whack> mem search d0000 10000000 deadbeef Searching 00000000 .. 01000000 for deadbeef [ ] Found at 000d0cb0 If this key cannot be found, something went wrong with the copy or memory has become corrupt. Table Continued 20 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 9, subcode 0x3 (0) Description SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Low 1M test: Test completed: x iterations, y probes, z errors found The low 1 MB of memory is thoroughly tested to ensure reliable operation as this is the memory area that the BIOS and Whack use during further initialization and testing.If this test fails, it should not be ignored with ^C as having reliable system memory is critical to proper operation. Resolution: A) Cycle power on the node. Occasionally, memory will fail during a memory test due to metallic dust. B) Reseat CPU memory DIMMs. C) Pull CPU DIMMs, blow dust from sockets, reseat. D) Replace CPU memory DIMMs in pairs to ensure replacement parts are matched. PIII nodes: Non-paired DIMMs are proximally closest. Paired DIMMs are the leftmostleftmost and rightmost-rightmost of each two which are proximally closest. P4 nodes: Paired DIMMs are proximally closest. DIMM0 and DIMM1 are a pair. DIMM2 and DIMM3 are a pair. E200, Ironman, and Tinman nodes: There is only a single pair of CPU memory DIMMs. E) Replace the node motherboard. Diagnostic: A) Run the memory test manually from Whack. You can use the "mem test range <base> <size>" command to test a range of memory. B) Write to known bad memory with the Whack "c <addr>" command and observe written contents with "d <addr>" Write enough patterns that you might be able to observe a pattern such as stuck or floating bit. Fatal error: Code 9, subcode 0x4 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" High 64K test: Test completed: x iterations, y probes, z errors found In addition to the low 1 MB of memory, older BIOS versions also thoroughly tested the high 64 KB of memory. This is because the operational stack for the CBIOS and Whack used to reside at this address, which made the memory critical for proper initialization and testing.The current BIOS now uses memory below 1 MB for stack space, so this failure code is deprecated. See Code 9, sub-code 0x3 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 21 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 9, subcode 0x5 (0) Description SDRAM_FAILURE (<DIMM>) "Control Cache Failure" SDRAM walk: Test completed: xx iterations, yy probes, zz errors found During initialization (prior to a thorough test of the low 1 MB of memory), a quick walk through all CPU memory is performed.If an error is found, this fatal error is displayed. See Code 9, sub-code 0x3 for resolution information. Fatal error: Code 9, subcode 0x6 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Full SDRAM test: Test completed: xx iterations, yy probes, zz errors found During later testing, a full SDRAM test is performed which more completely verifies proper memory operation than the cursory SDRAM walk. This test is very similar to the initial thorough 1 MB test done during initialization. See Code 9, sub-code 0x3 for resolution information. Fatal error: Code 9, subcode 0x7 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Pairwwww DIMMxxxx: Illegal SPD <name of value> <value> This error indicates that a CPU DIMM was detected but that the EEPROM present on the DIMM reported an illegal or unsupported value for our memory controller. Example: Density (SPD byte 31) has more than 1 bit set (ie. 0x30) which indicates a non-standard part. See Code 9, sub-code 0x3 for resolution information. Most likely, the DIMM is not qualified for use in our Node Board. The DIMM number is logged in the Data field of the Fatal Error. Table Continued 22 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 9, subcode 0x10 (0) Description SDRAM_FAILURE"Control Cache Failure" Cannot allocate xx bytes for PCI bus yy scan or Cannot allocate xx bytes for PCI device on bus yy This error indicates there was not enough memory or a memory error occurred while attempting to allocate heap space during the PCI device probe. SDRAM is needed because the BIOS maintains a list of PCI devices present in the system. Resolution: A) Cycle power on the node. B) Remove all PCI cards. C) Replace CPU DIMMs. D) Replace the node motherboard. Diagnostic: A) Set BIOS verbose init flags to get more info during memory init and PCI scan. Whack> set perm mem_verbose Whack> set perm pci_all B) Use the "config->heap" command to show the heap_base, heap_top, and heap_limit values. Fatal error: Code 9, subcode 0x11 (0) SDRAM_FAILURE"Control Cache Failure" Cannot find bus xx in scanned PCI busses During the PCI bus scan, a list of PCI devices present is recorded in SDRAM.For each device present, a block of memory is allocated and initialized. This error indicates that a data value indicating bus number could not be found in the list of devices previously scanned.This is probably due to an SDRAM or CPU failure. Resolution: A) Cycle power on the node. B) Remove all PCI cards. C) Replace CPU DIMMs. D) Replace bootstrap CPU. E) Replace the node motherboard. Fatal error: Code 9, subcode 0x12 (0) SDRAM_FAILURE"Control Cache Failure" No memory installed. This error indicates that the CPU memory scan failed to locate any usable memory for the system. There must be at least one bank of SDRAM configured for the node to operate correctly. Resolution: A) Cycle power on the node. B) Verify CPU DIMM scan output shows DIMMs. C) Replace CPU DIMMs. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 23 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 9, subcode 0x13 (xxxx) SDRAM_FAILURE "Control Cache Failure" Unknown DDR2 frequency (xxxx) This error indicates that the CPU memory installed is of an unrecognized and thus unsupported memory speed. Supported speeds include 533, 667 and 800 MHz. Resolution: Replace CPU DIMMs with 533, 667 or 800 MHz modules. Fatal error: Code 9, subcode 0x14 (0) SDRAM_FAILURE "Control Cache Failure" FB-DIMM Initialization Failure This error indicates that CBIOS was unable to initialize the CPU memory installed. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs. C) Replace the node motherboard. Fatal error: Code 9, subcode 0x15 (data) SDRAM_FAILURE "Control Cache Failure" This error indicates that an uncorrectable ECC error was detected on a DIMM. The data value is a bitmask that may be decoded to determine which DIMM had the error. A value of 1 indicates DIMM 0, 2 indicates DIMM 1, 4 -> DIMM 2, etc. More than one bit may be set if CBIOS is unable to isolate the error down to a single DIMM. Resolution: A) Cycle power on the node. B) Replace FB-DIMM(s). C) Replace the node motherboard. Fatal error: Code 10, subcode 0x1 (0) PCI_FAILURE "PCI Failure" *** Error: Bus xx cannot be parent of bus yy. *** Error: Failure occurred during PCI device allocation. During the PCI scan, many devices which were programmed by previous PCI scan steps are examined again to verify the programming was successful. This error indicates that a bridge failed to record the PCI bus number of bridges below it. Resolution: A) Cycle power on the node. B) Remove all PCI cards. C) Replace the node motherboard. Diagnostic: A) Use Whack to evaluate offset 0x45 on the failing parent bridge to determine if the value isn't sticking there or there is some problem with the PCI bus below it. Table Continued 24 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x2 (0) Description PCI_FAILURE "PCI Failure" *** Error: Vendor vvvv, Device wwww, for index xxxx: Expected size yyyy, but got zzzz There are on the PCI bus several devices in a node board which are known by the CBIOS to have specific sizes. As a hardware consistency check, the BIOS verifies that these devices are not only present, but also have appropriate memory and I/O space requirements.If any device is found outside of expected requirements, it will cause this error. Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Swap out the PCI card for another qualified card (if it's a card). D) Pull all PCI cards to see if the problem persists. If so, replace any defective cards. E) Replace the node motherboard. Diagnostic: A) Use Whack command "pci probe" using the vendor ID provided in the fatal error to acquire the address information the card provides. If this information does not match the error above, this may be a transient. B) Use the Whack "d pci" command, providing it the "<bus>.<dev>.<func>" of the PCI device. Look for patterns in the data that might indicate a stuck bit. Table Continued Error codes—HPE 3PAR OS 3.3.1 25 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 10, subcode 0x3 (0) PCI_FAILURE "PCI Failure" *** Error: I/O space: address limit xxxx exceeded: yyyy This error indicates that the system has run out of available mapping area while attempting to map this device into the CPU's I/O address range (0x0000 - 0xfe00). The likely cause of this error is that a prior PCI device is consuming too much I/O space.Since most device I/O ranges are extremely small, it is likely a defective PCI card or PCI bus problem which is the cause. Resolution: A) Reseat all PCI cards. B) Swap out individual PCI cards. C) Replace the node motherboard. Diagnostic: A) Use Whack command "pci init" or "pci scan" to re-scan the bus. It may provide the information you need to determine the bad device. B) Review the prior PCI allocations to determine one which is unusually large. You will need to enter diagnostic mode to do this. There are two ways: 1) Press ESC at the initial memory test. Type "go" at the Whack prompt. Answer 'y' to run the PCI initialization. Answer 'a' to print on all phases. 2) Press ^W at the initial memory test.Type "config diag" at the Whack prompt. Answer 'y' to run the PCI initialization. Answer 'a' to print on all phases. Fatal error: Code 10, subcode 0x4 (0) PCI_FAILURE "PCI Failure" *** Error: 32-bit prefetchable memory: address limit xx exceeded: yy Many PCI devices (and software drivers) require DMA addressable memory within the 32 bit address space (less than 4 GB). For this reason, all 32 bit PCI devices are required to be mapped within this space.Currently, all CPU memory is also forced to be mapped within this space, limiting the maximum 32-bit CPU memory to about 3 GB. Resolution: A) Swap out individual PCI cards. B) Replace the node motherboard. See Code 10, sub-code 0x3 for diagnostic information. Table Continued 26 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x5 (0) Description PCI_FAILURE "PCI Failure" *** Error: 32-bit non-prefetchable memory: address limit xx exceeded: yy The non-prefetchable memory has the same 32 bit limitations as prefetchable memory does. See Code 10, sub-code 0x4 for resolution information. Fatal error: Code 10, subcode 0x6 (0) PCI_FAILURE "PCI Failure" *** Error: 64-bit prefetchable memory: address limit xxxx exceeded: yyyy 64 bit PCI devices are not limited to a 32 bit address space. The CPU, however, can only access a 36 bit space (when virtual memory is enabled). Because most drivers need direct access to the memory a device provides on the bus, the device must be addressable by the Pentium and so the maximum 64 bit address allowed is 0xf:ffffffff. This is 64 GB. See Code 10, sub-code 0x4 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 27 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 10, subcode 0x7 (0) PCI_FAILURE "PCI Failure" Testing CM PCI 64-bit data lines: FAIL The Cluster Manager (Eagle / Osprey) is used to perform a walking bit test on both PCI0 and PCI1 data paths to CPU memory. If a problem is found, with either path, this error will be displayed. The error will be further qualified by one of the following prior lines: PCIxxxx PCIxxxx BitZZ PCIxxxx PCIxxxx BitZZ PCIxxxx BitZZ all data bits stuck high found data bits stuck high: BitWW, BitXX, BitYY, all data bits stuck low found data bits stuck low: BitWW, BitXX, BitYY, data bits possibly floating: BitWW, BitXX, BitYY, Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Pull all PCI cards to see if the problem persists. If so, replace any defective cards. D) Replace the node motherboard. Diagnostic: A) Depending on the specific error above, check for stuck or floating pins on CM's connection to the appropriate PCI bus. B) Depending on the specific error above, check for stuck or floating pins on CIOB's (RCC South Bridge) connection to the appropriate PCI bus. Table Continued 28 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x8 (0) Description PCI_FAILURE "PCI Failure" *** Error: Miscompare CPU Memory to CM Expected (0xAAAAAAAA) Actual (0xBBBBBBBB) Offset (0xCCCCCCCC) *** Error: Miscompare CM to CPU Memory Expected (0xAAAAAAAA) Actual (0xBBBBBBBB) Offset (0xCCCCCCCC) CBIOS runs simple CM PCI Tests as part of POST in both normal operation and manufacturing test. The tests use XCBs to transfer data over both CM PCI interfaces from Cluster Memory to CPU Memory and back. If any test fails due to a data miscompare, the test will generate this fatal error code with sub-code '0x4'. These tests are similar to the Cluster Memory Tests and may fail due to Cluster Memory SDRAM hardware or CPU SDRAM hardware failures. Any test failure will result in a fatal error. Resolution: A) Cycle power on the node. B) Reseat CM memory riser card. C) Reseat the failing Cluster memory DIMM. D) Replace the failing Cluster memory DIMM. E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CMA register set which is mapped into CPU memory for access.Use the Whack "pci probe mem 1590" command to find the Cluster Manager on the PCI bus. The base address in CPU memory for the configuration and status registers (CSRs) is Window 0. Example: Whack> pci probe mem 1590 Win Baseaddr Basesize Identity [0] 00:90200000 00:00000400 3PAR (ASIC) LPC# [1] 00:20000000 00:20000000 [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x90200000 above). This is the base address of the Cluster Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information on register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Table Continued Error codes—HPE 3PAR OS 3.3.1 29 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 10, subcode 0x9 (0) PCI_FAILURE "PCI Failure" *** Error: PCI bridge has dead clock: xxxxxxxx This error indicates one of the PCI bridges on the board has a bad clock value and is refusing to accept programming of a good clock. Resolution: A) Cycle power on the node. The problem may occur on power cycle (only) with random chance on a bad board. B) Pull all PCI cards which have integrated bridges (QLogic quad port cards are a good example of this). You should power cycle several times to determine it is not an intermittent problem with the motherboard. C) Replace the node motherboard. Diagnostic: A) The PCI output just prior to the fatal error will indicate which of the four bridges has failed.It will be text similar to "Bridge #1 (controls slots 4 & 5)."Refer to rework documentation to correct this problem. Fatal error: Code 10, subcode 0xa (0) PCI_FAILURE "PCI Failure" *** Error: PCI bridge has bad GPIO clock select inputs: x This error indicates one of the PCI bridges on the board has a bad GPIO input which selects bridge clock sources on a power on condition. Resolution: A) Cycle power on the node. The problem may occur on power cycle (only) with random chance on a bad board. B) Replace the node motherboard. Diagnostic: A) The PCI output just prior to the fatal error will indicate which of the four bridges has failed.It will be text similar to "Bridge #1 (controls slots 4 & 5)."Verify that GPIO lines 0-3 are being properly pulled high by comparing against known good board. Table Continued 30 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0xb (0) Description PCI_FAILURE "PCI Failure" Warning: This node has xx PCI cards present, but yy is the required minimum.Please verify your node is properly configured. You may adjust the required minimum with the "set pci_min" command. This error indicates this node has detected less PCI cards than the recommended 3PAR minimum.In a system configuration where there are less than the minimum active PCI cards, inactive load cards should be used to reach the required minimum. Resolution: A) Verify the minimum required number of PCI cards are inserted in the node. Install dummy load cards to reach the required minimum. B) Verify all PCI cards in the system have been identified.Replace any missing card. C) Replace the node motherboard. Diagnostic: A) Isolate the problem to one or more slots by placing load cards in all slots, and then using the "i2c vsc" command to find which slots do not report a load. B) You can use the "i2c vsc" command to verify cards are reporting correct wattages. You can use the "pci probe" command to display all PCI devices, and locate which slot in which they are inserted. Replace any defective card. Fatal error: Code 10, subcode 0xc (0) PCI_FAILURE "PCI Failure" Testing CM PCI 64-bit address lines: FAIL CM XCB TEST miscompare at offset, uuuu Expected (vvvvvvvv) Actual (wwwwwwww) CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz) The Cluster Manager is used to perform a walking bit test on both PCI0 and PCI1 address lines paths from CPU memory into cluster memory. If a problem is found (with either path), this error will be displayed. The particular memory address which caused this error will be indicated. Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Pull all PCI cards to see if the problem persists. If so, replace any defective cards. D) Replace the node motherboard. Diagnostic: A) Depending on the specific error above, check for stuck or floating pins on the Cluster Manager's connection to the appropriate PCI bus. B) Depending on the specific error above, check for stuck or floating pins on CIOB's (RCC South Bridge) connection to the appropriate PCI bus. Table Continued Error codes—HPE 3PAR OS 3.3.1 31 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0xd (zz) Description PCI_FAILURE "PCI Failure" *** Vendor xxxx device yyyy on motherboard not yet qualified. *** Vendor xxxx device yyyy in slot zz not yet qualified. This is an error indicating that the device found is not recognized by the BIOS as a 3PAR-qualified device.This may be because the board is a new generation or that there was a PCI error in communicating with the device. In the former case, it is probably safe to press ^C to ignore this error. In the later case, it is possible that part of the board has become non-functional to where the BIOS may not be able to determine if the rest of the board will continue to function. If you need to override this feature, enter Whack at this point by pressing ^W. Enter the following command: Whack> set perm pci_unqual_ok If the data field is non-zero, it indicates the BIOS discovered the problem is a card in a particular PCI slot. The specific codes are as follows: * 30 is PCI Slot 0 * 31 is PCI Slot 1 * 32 is PCI Slot 2 * 33 is PCI Slot 3 * 34 is PCI Slot 4 * 35 is PCI Slot 5 Resolution: A) Swap out the PCI card for a qualified card. B) Replace the node motherboard. Diagnostic: A) If the card is a QLogic, use the Whack command "pci probe 1077" to find the device and display its device ID. You may need to press ^W first if the BIOS is still at the fatal error. There are several currently qualified PCI cards. Some include the QLogic 2200, 2300, and 2312. More will be qualified in the future. B) The PCI probe should have shown the bus.dev.func specifier you need to display card information directly using Whack. Use the Whack "d pci" command giving it the "<bus.<dev>.<func>" as a parameter. You should see a standard PCI header present. C) Try the same or a different card in a different PCI slot to see if the slot has failed. Table Continued 32 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0xe (0) Description PCI_FAILURE "PCI Failure" PCI bus scan and allocation completed in 21 passes. *** Error: PCI scan required too many passes. Bad PCI interaction. This error indicates the PCI scanning code was unable to lay out a valid PCI address table mapping within 21 passes. The cause of this error is possibly due to either defective hardware or BIOS firmware. Resolution: A) Remove all PCI cards. If error goes away, attempt to find failed card by process of elimination (put back half of the cards and try to boot again). B) Replace the node motherboard. Diagnostic: A) Observe other errors that may happen at the same time as this error. Is there and indication that it is a board ASIC which is failing? In general, some other error should trigger before this one, since device limits are verified. B) Contact BIOS engineer for debug assistance. Fatal error: Code 10, subcode 0x10 (0) PCI_FAILURE "PCI Failure" *** Error: IMB.A isn't turned on This error indicates a possible hardware failure on the board. The bus which connects the CMIC (P4 North Bridge) to CIOB A failed to initialize properly. Resolution: A) Cycle power on the node. The problem may occur with random chance on a bad board. B) Replace the node motherboard. Diagnostic: A) Verify CIOBX2 is receiving a valid clock. B) Look at PCI device 0.0.2.f8 for CIOB A, or PCI device 0.0.1.f8 for CIOB B. The BIOS observes bit 0 of this register to tell if the IMB initialized (0 indicates success). Table Continued Error codes—HPE 3PAR OS 3.3.1 33 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x11 (0) Description PCI_FAILURE "PCI Failure" *** Error: Expected to see device xx.yy.zz `uuuu' but it is not responding: Vendor vvvv, Device wwww. *** Error: Failure occurred during PCI device allocation. The BIOS checks for specific onboard PCI devices (such as bridges) which are known to be on a particular node board. If a device listed in the BIOS table is not found on the board, then this error will result. Resolution: A) Cycle power on the node. B) Remove PCI cards and see if error disappears. C) Replace the node motherboard. Diagnostic: A) The error should indicate for you which device is missing. Observe to see if there is another unknown onboard device which has appeared in its place.This could be the device, masked behind a PCI bus problem. B) Verify the PCI ASIC is functional by checking clocks and PCI data lines to the device. Fatal error: Code 10, subcode 0x12 (0) PCI_FAILURE "PCI Failure" *** Error: The following device is not listed in the hardwired PCI descriptor table: Vendor xxxx, Device yyyy *** Error: Failure occurred during PCI device allocation. Onboard PCI devices (such as bridges) are well known by the BIOS to appear at specific bus addresses. If this device is not known by the BIOS, but it is configured on a bus which is not externally exposed (PCI slot), then you will see this error. Since the node board is a closed solution, this error might occur if an on board device is failing and does not report a correct device vendor/ID, or corrupts the device vendor/ID reported by another device on the bus. See Code 10, sub-code 0x11 for resolution information. Fatal error: Code 10, subcode 0x13 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Was yyyy but is now zzzz *** Error: Failure occurred during PCI device allocation. The PCI header is re-read on multiple passes of the PCI initialization. If a mismatch is found with a previous read of the PCI bus, then this error will result. This is a strong indicator of a flaky device or bus.If the BIOS is in Diagnostic mode (press ESC at the initial memory test), at this point, the following will also be displayed: Starting infinite PCI read loop... In Diagnostic mode, once a failure is detected, this test is then repeated until manual intervention. See Code 10, sub-code 0x3 for resolution information. Table Continued 34 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x14 (0) Description PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Invalid 64-bit size: yyyy *** Error: Failure occurred during PCI device allocation. During PCI initialization, a 64 bit window was found on the PCI bus which is outside the 36 bit range imposed by the CPU. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x15 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Allocation size is zero *** Error: Failure occurred during PCI device allocation. During PCI initialization, a window was found on the PCI device with a size of zero. This fatal error may indicate that the BIOS is not able to properly communicate with the PCI device. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x16 (slot) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz *** Error: Failure occurred during PCI device allocation. During PCI initialization, each memory or I/O window present on each device found on the bus is programmed with a CPU memory bus address so that it may be accessed by further BIOS initialization, tests and of course the main operating system. The BIOS verifies the address it programs for each window was correctly programmed (by reading back the value just written). If they do not match, this error is generated. The slot number is an ASCII value represented as Hexadecimal. If the slot value is 0, then the failure occurred on a node motherboard device. If PCI Slot 0 was involved, then slot is 30. PCI Slot 1 is 31; PCI Slot 2 is 32; PCI Slot 6 is 36, etc. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x17 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz *** Error: Failure occurred during PCI device allocation. See Code 10, sub-code 0x16 for information on this error. Table Continued Error codes—HPE 3PAR OS 3.3.1 35 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x18 (0) Description PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz *** Error: Failure occurred during PCI device allocation. See Code 10, sub-code 0x16 for information on this error. Fatal error: Code 10, subcode 0x19 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Invalid allocation size: yyyy (Must be a power of 2) *** Error: Failure occurred during PCI device allocation. During PCI initialization, each memory or I/O window present on each device found on the bus is programmed with a CPU memory bus address. The size of the window require is provided by the specific PCI device. It is required that this window is a power of 2 in size (1 KB, 2 KB, 4 KB, ... 32 MB, 64 MB, etc). This is a consistency check the BIOS performs to ensure it is properly communicating with the PCI device. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x1a (0) PCI_FAILURE "PCI Failure" *** Error: Device does not fit into address space, skipping: attempted addr xxxx, size yyyy *** Error: Failure occurred during PCI device allocation. During PCI initialization, the entire PCI bus is walked as a tree and devices registers are initialized and mapped into processor address space using this tree. The bus structure is then ordered and summarized into a table so that software can later find specific devices for high level initialization. This specific error indicates the PCI scan attempted to map a PCI device into the CPU's 32-bit address space, but failed due to no more available space. Verify that NVRAM flags such as "pci_base" and "mem_max" are not set to unusual values. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x1b (0) PCI_FAILURE "PCI Failure" *** Error: IMB.B isn't turned on This error indicates a possible hardware failure on the board. The bus which connects the CMIC (P4 North Bridge) to CIOB B failed to initialize properly. See Code 10, sub-code 0x10 for resolution information. Table Continued 36 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x1c (data) Description PCI_FAILURE "PCI Failure" *** Error: PCI CIOB Primary www MHz (xxx), Secondary yyy MHz (zzz) This error indicates a possible hardware failure on the board. The CIOB (which connects the North Bridge to the I/O system) has an incorrect clock speed. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Eagle nodes should run CIOB at 66 MHz on both the primary and secondary sides. Review fatal error output to determine if the primary side or secondary side is affected. B) Verify clock with scope. Check strapping resisters which select CIOB bus clock speed on reset. Fatal error: Code 10, subcode 0x1d (0) PCI_FAILURE "PCI Failure" *** Error: PCI bridge has bad secondary speed: v.w.x.y = zzzz This error indicates one of the PCI bridges on the board has a bad speed selection set, which could indicate an incorrect type of PCI card has been installed or that bridge mode select strappings are bad. Resolution: A) Pull all PCI cards one at a time to determine failed card. B) Replace the node motherboard. Diagnostic: A) Check Intel 31154 mode select strapping resistors to ensure PCI-X mode is selected. Refer to Ironman rework instructions to correct this. B) PCI offset 0xf2 in the 31154 indicates, among other things, the mode selected. Bits 6-8 should have the value 010 for proper operation (100 MHz secondary PCI bus speed). C) Some pre-production Ironman nodes have not been reworked to correct this defect. To ignore this error, set the "pci_speed_any" NVRAM flag by pressing ^W to enter Whack and entering: Whack> set perm pci_speed_any Table Continued Error codes—HPE 3PAR OS 3.3.1 37 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 10, sub-code 0x1e (0) Description PCI_FAILURE "PCI Failure" *** Error: PCIe x.y.z: Invalid port configuration strappings (xxx). The indicated PLX switch chip has incorrect hardware configuration strappings. Resolution: Replace the node motherboard. Non-fatal error: Code 10, PCI_FAILURE "PCI Failure" sub-code 0x1f (YYYYYYxx) *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected link width detected (xx). This error indicates that the device found is not running at the correct PCIe link width. If the "xx" portion of the data field is non-zero, it indicates a problem with a particular PCI slot. The specific codes for "xx" are as follows: 30 is PCI Slot 0 31 is PCI Slot 1 32 is PCI Slot 2 33 is PCI Slot 3 34 is PCI Slot 4 35 is PCI Slot 5 36 is PCI Slot 6 37 is PCI Slot 7 38 is PCI Slot 8 To ignore this error, enter Whack by pressing ^W and entering: Whack> set perm pci_speed_any Resolution: A) Replace indicated card (if "xx" is non-zero). B) Replace node motherboard. Non-fatal error: Code 10, sub-code 0x20 (YYYYYYxx) PCI_FAILURE "PCI Failure" *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected link speed detected (xxx). This error indicates that the device found is not running at the correct PCIe link speed. See Code 10, sub-code 0x1f for resolution information. Table Continued 38 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 10, sub-code 0x21 (xxx) Description PCI_FAILURE "PCI Failure" *** Error: Slot xxx indicates no HBA present, but PCI device found This error indicates that a PCI device was found in a slot which was expected to be empty. The likely cause of this failure is an HBA which is not fully seated. If this is an expected failure, you can set "pci_missing_ok" to override this check. Resolution: A) Reseat or replace the indicated HBA. B) Replace node motherboard. Non-fatal error: Code 10, sub-code 0x22 (xxx) PCI_FAILURE "PCI Failure" *** Error: Slot xxx indicates HBA present, but no PCI device found This error indicates that no PCI device was found in a slot which was expected to be populated (HBA present). The likely cause of this failure is an HBA which has failed. If this is an expected failure, you can set "pci_missing_ok" to override this check. Resolution: A) Reseat or replace the indicated HBA. B) Replace node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 39 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 10, sub-code 0x23 (data) Description PCI_FAILURE "PCI Failure" *** Error: PCI device bb.dd.ff (slot ss) hung during previous error scan This error indicates that during a previous PCI scan, the CPU hung. The most probable cause of this error is a defective HBA. The data field provides several details about the suspect device. The low byte indicates which PCI slot, if known. Value 0x30 corresponds to PCI Slot 0, 0x31 is PCI Slot 1, ..., 0x38 is PCI Slot 8. Byte 2 and byte 1 correspond to the PCI bus.dev.func. Byte 3 indicates whether the failure occurred during a PCI error scan, and whether this is a repeat failure. Decode table for data: bits 0..7 PCI Slot (0x00=MB, 0x30..0x38=PCI Slot 0..8) bits 8..10 PCI func bits 12..15 PCI dev bits 16..23 PCI bus bits 24..28 Reserved (0) bit 29 Repeat flag (1=repeat -- fatal error) bit 30 Hang during (0=PCI scan, 1=PCI error scan) bit 31 Reserved (1) Example (data=c00a0a35): The 0x35 value implicates PCI Slot 5. The 0a0a value is bus.dev.func 0a.01.02. The c0 value tells the hang occurred during a PCI error scan. Example (data=a0090831): The 0x31 value implicates PCI Slot 1. The 0908 value is bus.dev.func 09.01.00. The a0 value indicates a repeated hang during the PCI scan. Resolution: A) Replace HBA if PCI Slot is indicated. B) Convert to PCI bus.dev.func and match with the suspect PCI device from previous BIOS messages. If this is an onboard device, replace the node motherboard. Table Continued 40 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x24 (data) Description PCI_FAILURE "PCI Failure" *** Error: PCI device bb.dd.ff (slot ss) hung during previous scan Hang occurred multiple times. This error indicates that during a previous PCI scan, the CPU hung repeatedly. Other than this being a fatal error, this code is identical to that of sub-code 0x23. Note that if this fatal error is seen without a preceding non-fatal subcode 0x23, then the failure is likely to be the node motherboard. If the non-fatal is not logged, then a PCI scan hung earlier in the PCI tree than a previous hang.Unless both hangs happened on the same HBA, the cause is likely a shared device on the node motherboard. See Code 10, sub-code 0x23 for resolution information. Fatal error: Code 10, subcode 0x25 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe bb.dd.ff: Serial EEPROM is not present. This error indicates that the PCI device does not have an EEPROM attached. Resolution: Replace node motherboard. Fatal error: Code 10, subcode 0x26 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe bb.dd.ff: Unable to write Serial EEPROM. This error indicates that the EEPROM failed to be programmed. Resolution: Replace node motherboard. Fatal error: Code 10, subcode 0x27 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe bb.dd.ff: Unable to read Serial EEPROM. *** Error: PCIe bb.dd.ff: Serial EEPROM index XX value 0xXXXXXXXX != expected 0xXXXXXXXX. This error indicates that BIOS was unable to verify the EEPROM contents after programming or that the data was successfully written but did not persist. Resolution: Replace node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 41 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 10, subcode 0x30 (0) Description PCI_FAILURE "PCI Failure" *** Error: PCIe b.d.f Link Width incorrect size. Found xx, s/b yy This error indicates that the device found is not running at the correct PCIe link width. xx is actual PCIe link width and yy is the expected PCIe link width. This error may be logged with some HBA cards with x4 PCIe lanes. To ignore this error, enter Whack by pressing ^W and entering: Whack> set perm pci_speed_any Resolution: A) Ok to ignore if this is related to HBA card with x4 PCIe lanes B) Replace indicated card. Fatal error: Code 10, subcode 0x31 (0) PCI_FAILURE "PCI Failure" *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected link width detected (xx). This error indicates that Harrier2 ASIC device found is not running at the correct PCIe link width. Resolution: A) Power cycle the node B) Replace node motherboard. Fatal error: Code 10, subcode 0x32 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe b.d.f indicates HBA present, but no PCI device found This error indicates that PCI device not found. Resolution: A) Reseat card B) Replace indicated card Table Continued 42 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 11, subcode 0 (yyyy) UNRECOVERABLE_TRAP "Unrecoverable Trap" *** Error: CPU exception detected: Stopping execution. The BIOS installs an interrupt handler to catch spurious (unexpected) interrupts and exceptions during initialization and testing of the node hardware.During initialization, the BIOS even tests to verify a generated interrupt is delivered correctly. This is a serious condition and should not be ignored by pressing ^C. The specific interrupt received is the sub-code displayed. The interrupt number will be less than 0x20. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Review previous output lines to determine whether interrupts were just enabled (it follows the CPU identification). You should see a message: --- This interrupt was expected If this is not present, then most likely the interrupt or exception occurred immediately after being enabled. B) Using Whack, you can manually enable and disable interrupts with the "cpu interrupt enable" and "cpu interrupt disable" commands. You can also use the "cpu interrupt <num>" command to generate an interrupt. If interrupts are enabled, you should see a message upon generating an interrupt. One of: --- This interrupt was expected or *** Error: Expected interrupt xxxx but got yyyy or *** Error: CPU exception detected: Stopping execution. The two former messages will only occur if the BIOS is still expecting an interrupt to be delivered. The later message will only be displayed if the interrupt is numbered 0x20 or higher. Table Continued Error codes—HPE 3PAR OS 3.3.1 43 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 12, subcode 0x0 (0) Description UNEXPECTED_INTERRUPT"Unexpected Interrupt" PIII or P4 node: --- SMI: No known cause (# zz) GPE status: yyyyyy, GPE input: zzzzzz An SMI is a System Management Interrupt, and interrupt generated by the node hardware for the BIOS to service a particular failure. This error indicates the BIOS was unable to determine the cause of the SMI delivered by hardware. See Code 11 for resolution information. Fatal error: Code 12, subcode 0x0 (0) UNEXPECTED_INTERRUPT"Unexpected Interrupt" Ironman, Tinman, Titan, or Atlas nodes: CPU0 SMI: Bootstrap CPU0 SMI: Updating CPU0 SMI: Updated --- SMI: No known cause (# 1) on CPU6 SMSCS[0] = 0x00000000 ... ALT_GP_SMI_EN = 0xbfbf ALT_GP_SMI_STS = 0x0000 TMP_STS= 0x00000000:88380000 TMP_INT= 0x00000000:00000001 This fatal error indicates the BIOS received an SMI, but wasn't able to determine which device caused the interrupt. In this example, the "Bootstrap," "Updating," and "Updated" messages suggest the BIOS firmware was updated. Resolution: A) Reboot the node. B) Replace the node motherboard. Fatal error: Code 12, subcode 0x1 (yyyy) UNEXPECTED_INTERRUPT"Unexpected Interrupt" *** Error: Expected interrupt xxxx but got yyyy During initialization, the BIOS installs an interrupt handler to verify interrupts are delivered reliably. It then generates an expected interrupt.If an interrupt is delivered which is not the same as the one expected, this error is displayed. The interrupt number, yyyy, represents which interrupt occurred. See Code 11 for resolution information. Table Continued 44 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 13, subcode 0x0 (yyyy) INTERRUPT_FAILURE "Interrupt Failure" *** Error: Interrupt 0x20 could not be generated. or *** Error: Interrupt 0xff could not be generated. During initialization, the BIOS installs an interrupt handler to verify interrupts are delivered reliably. It then generates a few expected interrupts.If the specific interrupt is not delivered, this error is displayed. The interrupt number, yyyy, represents which interrupt should have been generated. See Code 11 for resolution information. Fatal error: Code 14, subcode 0x0 (0) ECC_FAILURE "Control Cache ECC The Whack "mem test ecc" command the main memory to ensure ECC memory functioning. If this test fails, this message other messages giving details. Failure" performs an ECC test over error correction is is displayed, together with Note: Running the "mem test ecc" command destroys some memory locations in the range of [0 .. 512 KB] and [1 MB .. just below the top of SDRAM].Hence, executing this once Linux has booted will cause it to fail if it is reentered. If you see this failure often during BIOS initialization, then the cause is likely a hardware problem. Specifically, the error tells you that the hardware ECC error mechanism is not working correctly. Changing CPU memory DIMMs may solve the problem, but it's more likely a board failure. Resolution: A) Ensure the North Bridge heatsink is firmly attached. B) Replace CPU DIMMs. C) Replace bootstrap CPU. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 45 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 14, subcode 0x1 (1) Description ECC_FAILURE "Control Cache ECC Failure" *** Error: Missing ECC SMI [80] <= 1, data 0 0. Copy 0 Now 0 mode 0 00 10 20 30 - 0f: 1f: 2f: 3f: 00 01 04 aa 00 ff 09 aa 00 00 08 0a 00 00 09 02 00 00 20 a8 00 00 09 00 00 00 10 00 00 ff 09 00 | | | | 00 ff 18 00 00 ff 09 00 00 ff 00 00 00 ff 09 c0 00 ff 00 7b 40 ff 00 df 0c ff 59 ff 00 ff 8e ff This error indicates the BIOS ECC hardware test could not get the hardware to generate an ECC SMI in response to a corrupted memory address. It possibly indicates a failing DIMM or memory controller, or that memory timings are too fast for the DIMMs present in the node. See Code 14, sub-code 0x0 for resolution information. Fatal error: Code 15, subcode 0x0 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" mailbox register xxxx changed inappropriately (yyyy) != expected (zzzz) register test: FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. During POST, all present FCAL adapters are tested for functionality.The HBA cards sometimes require a firmware download for full capability. POST does not have access to this firmware and will only test basic register access and functionality.If the Register Test fails, POST will indicate this error. If the user continues past this error (^C), software will log the error and continue testing the other PCI cards (if present). Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. B) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued 46 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 15, subcode 0x1 (slot) Description PCI_FIBRE_FAILURE (<slot>)"PCI Fibre Failure" controller memory xxxx value (yyyy) != expected (zzzz) memory test:FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. During POST, all present FCAL adapters are tested for functionality.The HBA cards sometimes require a firmware download for full capability. POST does not have access to this firmware and will only test basic functionality. If the Onboard Memory Test fails, POST will indicate this error. If the user continues past this error (^C), software will log the error and continue testing the other PCI cards (if present). Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. B) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued Error codes—HPE 3PAR OS 3.3.1 47 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 15, subcode 0x2 (slot) Description PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" data bits possibly float: Bitxxxx-Bityyyy. PCI walking bits:FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. During POST, all present FCAL adapters are tested for functionality.The HBA cards sometimes require a firmware download for full capability. POST does not have access to this firmware and will only test basic functionality. If the PCI Fibre Card Bus Test fails, POST will indicate this error. If the user continues past this error (^C), software will log the error and continue testing the other PCI cards (if present). Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. C) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued 48 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 15, subcode 0x3 (slot) Description PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" data bits possibly float: Bitxxxx-Bityyyy. CM0 walking bits: FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. This test indicates a problem was observed with the fibre channel card talking with the Cluster Manager. If the "fibre test pci" test passed, then this problem is likely in the interface to the CM or CM memory. Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. C) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued Error codes—HPE 3PAR OS 3.3.1 49 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 15, subcode 0x4 (slot) Description PCI_FIBRE_FAILURE (<slot>) PCIe EYE test: FAIL (slot) = PCI slot number "PCI Fibre Failure" There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. If the "fibre test cm" test passed, then this problem is likely in the PCIe to PCIE link between teh card and the switch. Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. C) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Fatal error: Code 15, subcode 0x10 (slot) Fatal error: Code 15, subcode 0x11 (slot) Fatal error: Code 15, subcode 0x13 (slot) Fatal error: Code 15, subcode 0x14 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" BIOS can not make LSI card go into Operational state. Resolution: A) Replace card. Send failed card back for FA. PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" HBA card register test failure Resolution: A) Replace card. Send failed card back for FA. PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" LSI card register memory copy test failure. Resolution: A) Replace card. Send failed card back for FA. PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" LSI card register memory copy test failure. Resolution: A) Replace card. Send failed card back for FA. Table Continued 50 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 15, subcode 0x15 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" Firmware rev xxxx not supported. Upgrade to yyyy LSI card does not contain 3PAR-approved firmware. If you need to run with an LSI card which has an older firmware (engineering only), you can set the "lsi_downrev" flag in the BIOS.Example: Whack> set perm lsi_downrev Resolution: A) Replace card. upgrade. Fatal error: Code 15, subcode 0x16 (slot) PCI_FIBRE_FAILURE (<slot>) Unable to get firmware rev Send failed card back for "PCI Fibre Failure" Attempting to get the firmware version from the LSI card failed. Resolution: A) Cycle power on the node. B) Replace card. Send failed card back for FA. Fatal error: Code 15, subcode 0x17 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" Manufacturing test for E200 node Only. This error occurs when the onboard LSI chips are not found. They are expected to be in slot 0 and 3, with two devices on each slot. Resolution: A) Cycle power on the node. B) Replace motherboard. Fatal error: Code 17, subcode 0x0 (0) IDE_FAILURE "Internal Drive Failure" The IDE controller failed its internal self test. Resolution: A) Replace the IDE or SATA boot drive. B) Replace the IDE or SATA cable. C) Replace the node motherboard. Diagnostic: A) Whack "ide test" commands may be used to individually execute IDE tests. Fatal error: Code 17, subcode 0x1 (0) Fatal error: Code 17, subcode 0x2 (0) IDE_FAILURE "Internal Drive Failure" The IDE controller failed to perform a self test. See Code 17, sub-code 0x0 for resolution information. IDE_FAILURE "Internal Drive Failure" IDE register xx value (yyyy) != expected (zzzz) The IDE register test failed during a pattern test. See Code 17, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 51 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 17, subcode 0x3 (0) IDE_FAILURE "Internal Drive Failure" IDE register xx value (yyyy) != expected (zzzz) The IDE register test failed during a walking bit test. See Code 17, sub-code 0x0 for resolution information. Fatal error: Code 17, subcode 0x4 (0) IDE_FAILURE "Internal Drive Failure" There was an IDE failure in data requested by the operating system bootstrap. It is possible that data on the disk has become corrupt to the point the operating system will not successfully load. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x5 (0) IDE_FAILURE "Internal Drive Failure" Communication with the IDE interface timed out. This error indicates the drive is not responding to commands within an acceptable amount of time. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x6 (0) IDE_FAILURE "Internal Drive Failure" IDE reported a failure in read verify command. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x7 (0) IDE_FAILURE "Internal Drive Failure" A timeout (10 seconds) was detected while performing DMA operation. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x8 (0) IDE_FAILURE "Internal Drive Failure" An error condition was detected while performing DMA operation. Resolution: Replace the IDE or SATA boot drive. Table Continued 52 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 17, subcode 0x9 (xx) Description IDE_FAILURE "Internal Drive Failure" IDE power up: Unknown error ERROR : 80 SECCNT: 80 SECNUM: 80 CYLLOW: 80 CYHIGH: 80 DEVSEL: 80 ALT_STATUS: 80 Drive: BUSY The IDE drive had a failure at poweron reset which prevents it from communicating with the chipset IDE controller. Resolution: A) Cycle power on the node. B) Reseat drive cable on both node and drive. C) Replace the IDE or SATA boot drive. D) Replace the node motherboard. Diagnostic: A) Try using "ide reset" followed by "ide init" to clear the error. B) The I/O address of the register which could trigger this error at "ide init" is located at 0x1f1. Try using "io inb 1f1" and "io outb 1f1 <value>" to diagnose further. Table Continued Error codes—HPE 3PAR OS 3.3.1 53 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 17, sub-code 0x10 (data) IDE_FAILURE "Internal Drive Failure" A disk SMART threshold was triggered. an imminent boot drive failure. This would indicate Resolution: Replace the IDE or SATA boot drive. Diagnostic: The data value may be used to determine the specific SMART field which caused the alert. Examples: 0 - Unknown 1 - Raw Read Error Rate 2 - Throughput 3 - Spinup Time 4 - Start / Stop Count 5 - Reallocate Sector Count 6 - Read Channel Margin 7 - Seek Error Count 8 - Seek Time 9 - Poweron Hours 10 - Spin Retry Count 11 - Calibration Retry Count 12 - Power Cycle Count 192 - Poweroff Retract Count 193 - Load Cycle Count 194 - Temperature Celsius 195 - Hardware ECC Recovered 196 - Reallocate Event Count 197 - Current Pending Count 198 - Offline Scan UE Count 199 - UDMA CRC Error Count 200 - Write Error Count 201 - Off Track Error Count 202 - DAM Error Count 203 - Run Out Cancel 204 - Raw Read Error Count 205 - Thermal Asperity Count 207 - Spin High Current Count 208 - Spin Buzz Count 209 - Offline Seek Performance The "ide smart status" command may be used to display the current SMART status fields. Fatal error: Code 17, subcode 0x11 (0) IDE_FAILURE "Internal Drive Failure" IDE SMART self-test failed. The drive failed to finish a built-in self-test. Resolution: Replace the IDE or SATA boot drive. Table Continued 54 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 17, subcode 0x12 (0) Description IDE_FAILURE "Internal Drive Failure" Drive failed to collect SMART data. The data is vital for the drive to determine SMART trigger. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x13 (0) IDE_FAILURE "Internal Drive Failure" Drive refused to accept SMART commands. Resolution: Replace the IDE or SATA boot drive. Diagnostic: Use "ide smart enable" to turn on SMART before issuing more SMART commands. Fatal error: Code 17, subcode 0x14 (0) IDE_FAILURE "Internal Drive Failure" The SMART command issued to drive has incorrect syntax. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x15 (0) IDE_FAILURE "Internal Drive Failure" The SMART commands failed to write or read attributes. Resolution: Replace the IDE or SATA boot drive. Non-fatal error: Code 17, sub-code 0x16 (0) IDE_FAILURE "Internal Drive Failure" No IDE device was found. Resolution: A) Install or replace the IDE or SATA drive. B) Replace the node motherboard. Fatal error: Code 17, subcode 0x18 (0) IDE_FAILURE "Internal Drive Failure" The IDE controller failed the BIOS interrupt test, possibly due to a bad drive. See Code 17, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 55 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 17, sub-code 0x19 (0) IDE_FAILURE "Sequential DMA read timed out" DMA xfer error code xxxx The drive DMA test failed due to a timeout. Although each sequential DMA read operation is succeeding, the total test time was exceeded. The likely cause of this failure is a drive which is having to perform a large number of relocations due to failed sectors, or a drive interface failure which only shows up under stress. Resolution: Replace the IDE or SATA boot drive. Non-fatal error: Code 17, sub-code 0x1A (0) IDE_FAILURE "Active Partition Set Incorrectly" The active partition identified in the Master Boot Record does not match the default boot partition identified in the grub menu. This can cause an infinute reboot loop if the two partitions contain different BIOS versions. TPD will reboot expecting BIOS to update itself and BIOS will see what it believes is the correct version and skip the update. This situation should not occur generally but can happen when a previous update was aborted and not rolled back fully or correctly. Resolution: Correct the active partition setting or adjust the default boot partition in grub's menu to match. May also need to check the previous update or rollback to ensure all installed code is at the desired version. Fatal error: Code 17, subcode 0x20 (0) IDE_FAILURE "Internal Drive Failure" Drive did not return status to host after a command within a reasonable amount of time. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x21 (rpm) IDE_FAILURE "Internal Drive Failure" *** Error: Boot drive is not a Solid State Disk (SSD). This error occurs when the disk drive for a harrier system is not a SSD disk drive type. Resolution: A) Replace the SATA drive with a SSD drive. Table Continued 56 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 17, subcode 0x22 (disk size) IDE_FAILURE "Internal Drive Failure" *** Error: Disk Size (XXX.X GB) is less than 128 GB. This error occurs when we have 32 GB or less of cluster memory and the disk drive is less than 128 GB. This is because the disk is not large enough for the memory dumps if the node panics. Resolution: A) Replace the SSD drive with a drive of at least 128 GB. Fatal error: Code 17, subcode 0x23 (disk size) IDE_FAILURE "Internal Drive Failure" *** Error: Disk Size (XXX.X GB) is less than 256 GB. This error occurs when we have more than 32 GB of cluster memory and the disk drive is less than 256 GB. This is because the disk is not large enough for the memory dumps if the node panics. Resolution: A) Replace the SSD drive with a drive of at least 256 GB. B) Reduce cluster memory to 32 GB or less. Fatal error: Code 17, subcode 0x30 (0) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. Resolution: Replace the IDE or SATA boot drive. Non-fatal error: Code 17, sub-code 0x40 (xxxxxxxx) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. xxxxxxxx, AHCI Port Status register, for lab debug Resolution: TODO Non-fatal error: Code 17, sub-code 0x41 (xxxxxxxx) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. xxxxxxxx, AHCI Port Error register, for lab debug Resolution: TODO Non-fatal error: Code 17, sub-code 0x42 (xxxxxxxx) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. xxxxxxxx, AHCI Port TFD register, for lab debug Resolution: TODO Table Continued Error codes—HPE 3PAR OS 3.3.1 57 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 18, sub-code zzzz (0) BIOS_INT_UNIMPLEMENTED"BIOS Int Unimplemented" *** Real-mode BIOS interrupt: xxxx(error: yyyy) This error most commonly indicates a bad or missing boot area of the USB disk. Customer Service node-disks or node spares (FRUs) might not be shipped with an operating system.Attempting to boot from one of these disks without first installing the system software might produce this error message.From the Whack prompt, use the "boot net install" command to install the system software. In order for Linux to boot, LILO must load the kernel image. It needs assistance from the BIOS in order to perform this task. Linux also acquires some information from the BIOS using 16 bit BIOS interrupts. CBIOS automatically accepts and emulates traditional 16 bit BIOS interrupts to support these methods. If LILO or Linux triggers an interrupt which is not supported by CBIOS, this possibly fatal error will result. There are many obsolete BIOS facilities which are not supported by CBIOS.In some cases, the system boot may be able to continue after this error. The sub-code and minor code indicate the specific BIOS interrupt called and the eax register parameter value. This information may be useful to Engineering. Resolution: A) Reboot.Attempt to reproduce the problem. B) Reinstall system software on the disk. This may require a "boot net install" in order to reinstall the operating system. C) There may be a bug in the OS you are using or it has been misconfigured. Confirm this version of the OS has been verified to work on a 3PAR node board.Or, temporarily swap system disks with a known good system disk. D) Replace the boot drive and reinstall the system software. E) Replace the node motherboard. Diagnostic: A) Look up the displayed Real-mode BIOS interrupt number in a BIOS index to determine the facility the software is requesting.This may provide you a clue as to the cause. Table Continued 58 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description B) Use the Arium to single step through the return from the 16 bit handler to the originating code to determine what code is involved with the unimplemented BIOS operation. Fatal error: Code 19, subcode 0x0 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" Booting from SATA IDE... No IDE or USB drives present or boot sector is invalid. or Booting from SATA IDE (bootdev)... No IDE drive present or boot sector is invalid. or Booting from PATA IDE... No IDE drive present or boot sector is invalid. or Booting from USB... No USB drive present or boot sector is invalid. The IDE (PATA or SATA) or USB Flash disk is used for booting the operating system. This error indicates no a drive was found during during a hardware probe, but it was found to not be boootable. Resolution: A) Cycle power on the node. B) Verify disk power and data cables are connected to both the drive and the motherboard.The red stripe on the IDE data cable must be oriented closest to the power connector on the drive. C) Replace the disk power cable and/or data cable. D) Replace the drive. E) Replace the node motherboard. Diagnostic: A) Reset and enter Whack with ^W after the PCI bus scan but before the IDE probe. You should be able to use the "ide init" command to probe for a disk.Minimal output should include drive Capacity and Geometry (C/H/S: cylinder/head/sector). B) If the above information is available, use the "ide read" command to read a sector into CPU memory and verify it was read.Example: Whack> ide read 1000 0 1 Whack> d 1000 200 You should see the contents of sector 0, which (with a previously initialized node disk) will include the string "LILO" starting at byte offset 6. Table Continued Error codes—HPE 3PAR OS 3.3.1 59 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 19, subcode 0x1 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" IDE TIMEOUT waiting for DRDY The IDE disk is used for booting the operating system. This error indicates there was a problem communicating with the IDE controller, most likely due to a missing IDE hard drive, a disconnected cable, or a failed IDE hard drive. See Code 19, sub-code 0x0 for resolution information. Fatal error: Code 19, subcode 0x2 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" IDE TIMEOUT waiting for DRQ The IDE disk is used for booting the operating system. This error indicates that a command was issued to the IDE disk (read sectors) but the drive controller did not report back with the data within a reasonable amount of time.This may be caused by a failed sector or IDE controller failure. See Code 19, sub-code 0x0 for resolution information. Fatal error: Code 19, subcode 0x3 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" IDE ERROR reading sector xxxx The IDE disk is used for booting the operating system. This error indicates that a command was issued to the IDE disk (read sectors) but the drive controller reported that there was a error in reliably retrieving the requested sectors. This error may be caused by a failed sector or IDE controller failure. See Code 19, sub-code 0x0 for resolution information. Table Continued 60 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 20, subcode 0x0 (0) Description AP_INIT_FAILURE "AP Init Failure" *** Error: Failed to deliver startup message to CPU xxxx or *** Error: Errors in APs starting up. If a board has more than a single CPU, only one CPU comes out of power-on executing code. The other waits in a halted state for an AP message from the bootstrap processor. All MP-capable Pentium processor has an onboard Advanced Programmable Interrupt Controller called the Local APIC (there is a complementary component called the IOAPIC located on the motherboard). Once the bootstrap processor has completed all node board initialization and testing, it starts up each application processor (which in Intel terms is defined as any processor other than the initial bootstrap processor).Each AP then does a brief identify, verify, and microcode update. In the above case, if the local APIC fails deliver an AP startup to the other processor within a reasonable amount of time, this error will result. In a single CPU system this error should not occur because an earlier probe should identify no AP processor is present. If the Local APIC cannot reliably deliver a message over the IOAPIC, then it is probably not safe to ignore this error by pressing ^C. Resolution: A) Reseat both processors in their sockets. B) Replace each processor individually. Do not bother with downgrading to a single processor system since this is a multiprocessor startup issue. The problem processor will not be apparent with a single processor configuration. C) Replace the node motherboard. Diagnostic: A) Use Arium as bootstrap processor and verify that APIC message is being delivered to the bus. B) Use Arium as application processor and verify that APIC message is delivered from the IOAPIC on the motherboard. The application processor should then start executing code at the default APIC address of 0x30000 (FIRST_SMM_BASE). Table Continued Error codes—HPE 3PAR OS 3.3.1 61 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 20, subcode 0x1 (0) Description AP_INIT_FAILURE "AP Init Failure" *** Error: Startup message successfully sent to CPU xxx, no response After an AP startup message has been delivered to the application processor through the IOAPIC, the bootstrap processor waits for an indication the AP has started. If the indication is not received before a reasonable timeout, this error is given. It should be ok to ignore this message by pressing ^C and continue with further BIOS diagnostics. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 20, subcode 0x2 (0) AP_INIT_FAILURE "AP Init Failure" *** Error: CPU xxxx failed to complete initialization. Once the application processor (AP) has started initialization, it sets a flag that the bootstrap processor can use to determine when the bootstrap processor has completed. If the AP remains in the AP_INIT_START state too long, this fatal error is displayed.It is probably not safe to resume after this error since the AP may be off executing errant code or interfering with bootstrap processor bus cycles. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 20, subcode 0x3 (0) AP_INIT_FAILURE "AP Init Failure" *** Error: POST failure on CPU xxxx: yyyy *** Error: CPU xxxx initialization failure. The application processor (AP) previously failed to complete a Built In Self Test (BIST). This is likely due to a bad processor. Resolution: A) Replace the application processor. Table Continued 62 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 20, subcode 0x4 (0) Description AP_INIT_FAILURE "AP Init Failure" *** Error: Invalid CPU for CPU xxxx, error code: yyyy *** Error: CPU xxxx initialization failure. During application processor (AP) initialization, it verifies that the CPU model, stepping, and clock multiplier which is being initialized matches those values of the bootstrap processor.If they do not match, this error will result. Resolution: A) Since the processors are possibly mismatched, remove the heatsink on both and verify that the CPU model and stepping are identical. See Code 20, sub-code 0x0 for more resolution information. Fatal error: Code 20, subcode 0x5 (0) AP_INIT_FAILURE "AP Init Failure" *** Error: More than wwww CPUs in system. *** Error: CPU xxxx initialization failure. The currently supported node board hardware configuration is a maximum of two physical processors. The BIOS uses this knowledge to limit the possibility of repeat initialization of the application processor (AP).If this message occurs, it may be due to a variety of hardware problems, but most suspect is the application processor. See Code 20, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 63 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 21, subcode 0x0 (0) Description SMI_SETUP_ERROR "SMI Setup Failure" *** SMI setup error: Not expecting to install a vector on CPU xxxx Intel processors support an interrupt level called SMI (System Management Interrupt) which is used for hardware management (usually by the BIOS).Events such as power management and hardware errors usually trigger an SMI. When an SMI is triggered, the system enters SMM (system management mode).In a multiprocessor system, both processors are usually triggered by an SMI at the same time. Since both processors may attempt to service an SMI at the same time, each processor must have a unique stack area where to dump processor context.SMI setup configures each processor individually with a unique stack address for SMI handling. This particular error indicates that the SMI setup handler has detected a stack setup SMI, yet one was not expected (because one had already been set up or CPU initialization had not yet reached the point of SMI setup). The bootstrap CPU delivers the setup SMI to itself and to the application processor.This error could be caused by a faulty CPU or motherboard. The CPU which reports the setup error may not be the one at fault. Resolution: A) Pull one processor at a time to determine if the problem is reproducible with a single CPU. B) Swap CPUs to see if the exact problem moves with CPU. If not, it may be the motherboard. C) Individually replace both CPUs. D) Replace the node motherboard. Diagnostic: A) Use Arium as bootstrap processor and verify that the SMI is being delivered. Fatal error: Code 21, subcode 0x1 (0) SMI_SETUP_ERROR "SMI Setup Failure" *** SMI setup error: CPU xxxx not found in CPU table During SMI setup, each processor in turn receives an SMI and then performs stack initialization. Prior to the SMI setup, all application processors wait in a halted state for an APIC message to identify and download microcode. If the processor performing an SMI setup detects that it had not previously executed and added its CPU ID to the system table, then this fatal error will be displayed. See Code 20, sub-code 0x1 for resolution information. Table Continued 64 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 21, subcode 0x2 (0) SMI_SETUP_ERROR "SMI Setup Failure" *** SMI setup error: CPU xxxx did not respond During SMI setup, each processor in turn receives an SMI and then performs stack initialization. This error indicates that the bootstrap processor issued an SMI through the APIC and it was not processed by the targeted processor. This indicates that either SMIs are not being delivered properly, or that the targeted processor may be defective. See Code 20, sub-code 0x1 for resolution information. Fatal error: Code 22, subcode 0x0 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `cbios_to_os_message' test, expected xx but got yy CBIOS provides service to the 3PAR kernel through a special command queue.Responses are returned to the OS through another queue, which is tested during BIOS initialization. Sub-code 0x0 indicates that the CBIOS to OS queue did not pass the built-in test. Resolution: A) Pull one processor at a time to determine if the problem is reproducible with a single CPU. B) Swap SDRAM with good SDRAM. C) Update CBIOS to the latest version. D) Replace the node motherboard. Fatal error: Code 22, subcode 0x1 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `os_read_message_test', failed to read message This error indicates that the CBIOS to OS queue test failed to acquire a message it previously sent. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 22, subcode 0x2 (0) CBIOS_OS_QUEUE_ERROR" CBIOS OS Queue Failure" *** Error: In `os_read_message_test': expected: uuuu vv `ww' but got: xxxx yy `zz' This error indicates that the CBIOS to OS queue test failed because the message received did not match the message sent. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 22, subcode 0x3 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `os_read_message_test', expected no more data This error indicates that the CBIOS to OS queue test failed because there were more items in the queue than those sent. See Code 20, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 65 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 22, subcode 0x4 (0) Description CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: Couldn't send simulated message from OS to CBIOS, code == xx This error indicates that the OS to CBIOS queue test failed. The minor code will indicate to an engineer what went wrong. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 22, subcode 0x5 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: Inconsistent queue: queue_base == ww, queue_limit == xx queue_inp = yy, queue_otp = zz This error indicates that the CBIOS to OS queue test failed because the queue pointers became corrupt. See Code 20, sub-code 0x0 for resolution information. Non-fatal error: Code 23, sub-code 0x0 (0) FLASH_CRC_ERROR "Flash CRC Failure" CRC mismatch for failsafe CBIOS Upon startup, CBIOS computes a strong CRC over all executable code and data stored in the flash.This is done to guard against flash corruption which also ensures reliable system initialization and testing. This specific sub-code indicates that a CRC error was detected in the failsafe component of CBIOS. The majority of the failsafe is only executed if corruption is detected in the main CBIOS. Resolution: A) Try pressing ^C to resume. Perform a flash update as soon as possible.If flash updating under Linux, make sure to specify the 'failsafe' option to update the failsafe area as well. B) If the flash update is successful, but you still get a CRC error, verify that your flash image is intact. The Linux flash utility does this automatically using the same strong CRC algorithm as the BIOS uses. C) Replace the node motherboard. Diagnostic: A) Use the Whack "net tftp" command to download an identical image to that which is in flash. Use the Whack "mem compare" command to locate bytes which differ so that you may examine those values with "d <addr>" B) If Whack is not available, use the Arium to look at flash address space for defects. It may be a stuck, floating, or bridged address or data line. C) Replace the flash part. Table Continued 66 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 23, sub-code 0x1 (0) Description FLASH_CRC_ERROR "Flash CRC Failure" Invalid entry point for full CBIOS Boot with clustering disabled and update flash immediately! Prior to starting up the non-failsafe (full diagnostic) CBIOS image, the failsafe CBIOS performs some consistency checks over the image. This error indicates corruption was detected in the entry point to the main routine of the full CBIOS. If you are have recently installed a new CBIOS which is larger than the previous, it is possible to get this error because the failsafe BIOS present cannot properly verify the larger size BIOS. Resolution: A) Try pressing ^C to resume. Perform a flash update as soon as possible.Boot with clustering disabled by typing "tpd nokmod" at the LILO prompt. Once the node has booted, login as root and use the flash command. Example: # flash /opt/tpd/bios/bios-1.9.4 Upon completion of the flash update, reboot and observe console messages to ensure the CRC error no longer occurs. B) If the flash update is successful, but you still get this error, verify that your flash image is intact. The Linux flash utility does this automatically using the same strong CRC algorithm as the BIOS uses. C) Replace the node motherboard. Diagnostic: A) If Whack is not available, use the Arium to look at flash address space for defects. It may be a stuck, floating, or bridged address or data line. B) Replace the flash part. Fatal error: Code 23, subcode 0x2 (0) FLASH_CRC_ERROR "Flash CRC Failure" Invalid magic for full CBIOS Prior to starting up the non-failsafe (full diagnostic) CBIOS image, the failsafe CBIOS performs some consistency checks over the image. This error indicates the failsafe BIOS could not find a proper header record for the full CBIOS. See Code 23, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 67 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 23, subcode 0x3 (0) FLASH_CRC_ERROR "Flash CRC Failure" CRC mismatch for full CBIOS Prior to starting up the non-failsafe (full diagnostic) CBIOS image, the failsafe CBIOS performs a strong CRC over the full CBIOS image to verify the image's integrity. This error indicates the full CBIOS had a CRC failure. See Code 23, sub-code 0x1 for resolution information. Fatal error: Code 23, subcode 0x4 (0) FLASH_CRC_ERROR "Flash CRC Failure" Failsafe CBIOS is now enabling the full CBIOS ... The full CBIOS either detected an error or user input (the 'f' key) which forced it to return to the failsafe BIOS. If the user did press the 'f' key, then press ^C to resume startup under the failsafe BIOS. If the user did not press the 'f' key, browse prior messages to learn of a failure which may have caused this error. Resolution: A) If the error was not the result of a keystroke, try pressing the 'n' key at BIOS startup to clear any initialization skips.It may be recorded in NVRAM to skip the full BIOS version and always execute the failsafe. See Code 23, sub-code 0x1 for more resolution information. Non-fatal error: Code 23, sub-code 0x10 (flags) FLASH_CRC_ERROR "EOS: Repairing Main BIOS" The EOS Main BIOS image in SPI has failed to boot and the FPGA watchdog has reset the node to boot from the failsafe BIOS. The failsafe BIOS has detected a bad CRC in the main BIOS region of flash and is attempting to automatically re-flash that region from disk. The data field contains flags indicating what errors were seen during the verification check: Bit00 -> Descriptor Region CRC Failed. Bit01 -> Main BIOS Region CRC Failed. Bit24 -> Test Injection Descriptor CRC Error (munge_desc_crc). Bit25 -> Test Injection Main BIOS CRC Error (munge_bios_crc). Other bits are undefined. Table Continued 68 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 23, subcode 0x11 (bbxxyyzz) Description FLASH_CRC_ERROR "EOS: Main BIOS Corrupt" The EOS Main BIOS image in SPI has failed to boot and the FPGA watchdog has reset the node to boot from the failsafe BIOS. The failsafe BIOS has detected a bad CRC in the main BIOS region of flash.The failsafe BIOS has also detected five or more attempts to automatically recover the Main BIOS within the past two hours and has stopped attempting automatic recovery. The data field contains the build (bb) and version (xx.yy.zz) of the Main BIOS that failed to boot. Fatal error: Code 23, subcode 0x12 (0) FLASH_CRC_ERROR "BIOS Signature Verification Failed" Starting in Manchester (v3.2.2) the BIOS image for Tornado, Chimera and Orion platforms is signed and the encrypted signature is included in the "uefi_spi_....signed.bin" file. This fatal error code is logged when the encrypted signature does not match the file hash calculated immediately before a flash update is performed. The flash update is not performed in this instance. If an unsigned but otherwise valid image is specified for an update, this code will also be logged. This code is not logged for a binary file that is not a valid format. Retry the flash update operation with a correctly signed and uncorrupted binary file. If the BIOS image installed with TPD becomes corrupted, then a TPD update may be required to fix the corrupted BIOS package or BIOS package contents. Table Continued Error codes—HPE 3PAR OS 3.3.1 69 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 24, subcode 0x0 (ptr) TURD_EXCEEDED_LIMIT "TURD Exceeded Limit" *** Error: MP turd exceeded 0x100000 The BIOS presents to the operating system a set of tables which describe the hardware present in the system. These tables have a rigid structure for each type of device.If the CBIOS configuration structure becomes corrupt, this error may result when the TURD structures are initialized for the operating system. A consistency check ensures the TURD area does not go beyond 1 MB (which is the base address where the operating system normally begins using main memory).The data to this error is the pointer address reached, and will be greater than 0x100000.ptr is the value which exceeded 0x100000. Resolution: A) Remove cards from all PCI slots. If the error no longer occurs, it may be a hardware failure on one of cards. B) Replace the node motherboard. Diagnostic: A) Look at memory starting at 0x000f0000. 0x5f504d5f is the magic number of the first first TURD (the MP Configuration table). B) Turn on PRINTING_TURD and DEBUG_APIC compile flags. Fatal error: Code 24, subcode 0x1 (0) TURD_EXCEEDED_LIMIT "TURD Checksum Failure" *** Error: MP table checksum failed - stopping table build The BIOS presents to the operating system a set of tables which describe the hardware present in the system. In this case, the BIOS detected that one of the tables had a bad checksum. Resolution: A) Remove cards from all PCI slots. If the error no longer occurs, it may be a hardware failure on one of cards. B) Replace the node motherboard. Table Continued 70 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 24, subcode 0x2 (0) TURD_EXCEEDED_LIMIT "TURD Exceeded Limit" *** Error: Too many MP table entries - stopping table build The BIOS presents to the operating system a set of tables which describe the hardware present in the system. In this case, the BIOS detected that it had added too many entries to the table, likely because too many PCI devices are present in the system. This error is likely due to an earlier PCI failure. Resolution: A) Remove cards from all PCI slots. If the error no longer occurs, it may be a hardware failure on one of cards. B) Replace the node motherboard. Fatal error: Code 25, subcode 0x0 (0) PROM_FAILURE"PROM Failure" The node board has two different Serial EEPROM devices used for storing persistent board information. One PROM device is located on the I2C bus.It stores node board manufacturing, assembly, serial number, and error message log information. The second PROM device is connected through the Intel 82559ER ethernet controller. It stores ethernet controller information such as initialization state and the hardware MAC address. PROM checksum: FAIL The PROM which stores node board manufacturing, assembly, serial number, and error message log information does not have a valid checksum.If the PROM has not yet been initialized or if it has become corrupt, you may see this error. Resolution: A) Press ^W to enter Whack and use either "prom init" or "prom edit" to correct this error. B) If the information looks correct with "prom id" then try using "prom checksum" to rewrite the checksum. C) Replace the node motherboard. Diagnostic: A) Use the Whack "d prom <addr>" command to display PROM contents. Use the Whack "c prom <addr>" command to change PROM contents. Look for a pattern in order to determine if the error is due to the device's connection with the motherboard or a hardware failure within the Serial PROM. Table Continued Error codes—HPE 3PAR OS 3.3.1 71 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 25, subcode 0x1 (0) PROM_FAILURE"PROM Failure" Ethernet 0 PROM checksum: FAIL The PROM which stores ethernet controller information does not have a valid checksum.If the PROM has not yet been initialized or if it has become corrupt, you may see this error. Resolution: A) Press ^W to enter Whack and use "prom id" to verify the other PROM is valid.If not, first use "prom init" or "prom edit" to set the PROM information. If the PROM information appears valid, use "prom mac" to reprogram the Ethernet MAC address and checksum. B) Try flushing out a correct checksum. Note: You must first select the device with an error using the "eth dev" command. Example: Whack> eth dev 1 Whack> eth checksum C) Replace the node motherboard. Diagnostic: A) Try programming a custom MAC address. Example: "prom mac 00:02:AC:00:00:43" B) Use the Whack "d eth <addr>" command to display PROM contents. Use the Whack "c eth <addr>" command to change PROM contents. Look for a pattern in order to determine if the error is due to the device's connection with the motherboard or a hardware failure within the Ethernet PROM. Table Continued 72 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 25, sub-code 0x2 (0) Description PROM_FAILURE"PROM Failure" Ethernet MAC xx:xx:xx:xx:xx:xx mismatches PROM: yy:yy:yy:yy:yy:yy The "prom mac" command may fix this. This error indicates the MAC address stored in the onboard Ethernet controller's PROM does not match that which can be computed from the board revision and serial number stored in the node's PROM. This mismatch suggests that one or the other PROM may contain corrupt contents. If the ethernet MAC address was purposely set to an address (see "prom mac" command), then this check may be overridden by setting the NVRAM "oddmac" flag. Example: Whack> set perm oddmac Resolution: A) Look for a prior message indicating an invalid board type or check the banner to ensure the board type and serial number are correct for this node. If either is not correct, use the 'prom edit' command to repair the corruption. B) Use the "prom mac" command to reprogram the MAC address in the ethernet controller's PROM. C) Replace the node motherboard. Diagnostic: A) Determine if the cause is due to a failing node PROM or ethernet controller PROM. Use the "db prom 0 20" command to display PROM contents and compare with expected values. Example: Whack> dbz8 prom 0 20 prom 0000: 00 04 09 20 10 03 04 35 . ...5 prom 0008: 30 53 4f 4c 01 10 00 00 0SOL.. prom 0010: 00 76 ff ff ff ff ff ff v...... prom 0018: ff ff ff ff c1 1f a4 5e .......^ Replace node PROM if it is defective. B) Use the "db eth 0 20" command to display ethernet PROM contents and compare with expected values. Example: Whack> dbz8 eth 0 20 eth 0000: 00 02 ac 14 00 76 03 01 ... v.. eth 0008: ff ff 01 00 01 07 00 00 ... .. eth 0010: 10 00 04 03 40 48 00 00 . ..@H eth 0018: 86 80 00 00 ff ff ff ff .. .... Replace ethernet PROM if it is defective. Table Continued Error codes—HPE 3PAR OS 3.3.1 73 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 25, sub-code 0x3 (0) Description PROM_FAILURE "PROM Failure" During initialization, CBIOS checks the prom for magic number. If the magic number test fails. This non-fatal logs when the magic number check fails. Resolution: A) On EOS platforms, the midplane, node type, slot id may need to be reconstructed with prom edit. The Ethernet MAC and PROM magic number may also need to be reconstructed. (Bug 82094) B) Previous platforms should be reinitialized and reconstructed automatically Diagnostic: A) Use "db i2c 2.a6.0 100" to view the contents of this region.Typically only the first 32 bytes are affected. Non-fatal error: Code 25, sub-code 0x4 (aabbccdd) PROM_FAILURE"PROM Failure" Board Spin value is invalid. fix this. The "prom edit" command may This error indicates the board spin value in the prom record is not in the proper range. The range of the board spin byte is 0x01 to 0x16. If the board spin number is out of this range, then this error will occur. NOTE: On Tinman, the board spin field is not used as board spin, so this field will always be 0x17.On Tinman, this is NOT flagged as a error. If the board spin field is not valid, then the BIOS used the board revision field. This is a two character field that must be "01" to "09", then "A0" to "A9", then "B0" to "B9" etc. If a character (A-Z) is in the secord byte or a non zero number (1-9) is the first character, then this is an error. In the data field, aa is the board spin value, bb is the calculated board revision, cc is the first character in the rev field, and dd is the second character in the rev field. Resolution: A) Use "prom edit" to fix/verify the board spin field. B) Use "prom edit" to fix the board revision field. C) Replace the node motherboard. Table Continued 74 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 25, sub-code 0x5 (0) Description PROM_FAILURE"PROM Failure" EOS node prom value is invalid. The "prom edit" command may fix this. This error indicates EOS node prom value in the prom record is not in the proper range. The node type and midplane type value in prom should be programmed correctly with prom edit command. Resolution: A) Use "prom edit" to fix/verify the midplane field. B) Use "prom edit" to fix/verify the node type field. Non-fatal error: Code 25, sub-code 0x6 (0) PROM_FAILURE"PROM Failure" EOS Node ID in Prom and Slot ID do not match. edit" command may fix this. The "prom This error indicates EOS Node ID prom value in the prom record does not match the Slot ID read from the fpga. The Node ID value in prom should be programmed correctly with prom edit command. Resolution: A) Use "prom edit" to fix/verify the Node ID field. Fatal error: Code 25, subcode 0x7 (devices) PROM_FAILURE"PROM Failure" CBIOS was unable to read the ethernet controllers PROM likely indicating a HW failure, a bad PROM or board level issue. "devices" is a mask that indicates what eth devices encountered the HW error. So it is possible, though unlikely, multiple controllers will have been tested and failed at the same time. Resolution: A) Cycle power on the node. B) Replace the node motherboard if the failed ethernet device is on the node. Diagnostic: A) Rerun the test or confirm with the dump command all reads fail: dump eth0:10 Table Continued Error codes—HPE 3PAR OS 3.3.1 75 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 25, subcode 0x8 (0) Description PROM_FAILURE"PROM Failure" Chimera node prom value is invalid. command may fix this. The "prom edit" This error indicates Chimera node prom value in the prom record is not in the proper range. The midplane type value in prom should be programmed correctly with prom edit command. Resolution: A) Use "prom edit" to fix/verify the midplane field. Non-fatal error: Code 26, sub-code 0x1 (ethdev) ETH_FAILURE "Ethernet Failure" eth0 device self test: FAIL All tests: xxxx (timeout) During initialization, CBIOS has the ethernet controller perform an internal test to verify correct operation. If the ethernet controller does not respond within a reasonable amount of time, this error is displayed. "ethdev" indicates the PCI Slot in device is located.This is an ASCII PCI slot 0. If the ethernet device motherboard, then ethdev will have which the failed ethernet value, so 0x30 indicates is located on the node a value of 0x00. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Verify the 82559ER shows up in a PCI scan. Use the Whack "pci find 8086" command. It should display the 82559ER Ethernet controller. B) Use the Whack "eth test" command to repeat the test. Make sure that CBIOS initialization has past the point of PCI scan.Use Whack "loop ffff eth test" to repeat in a loop. Table Continued 76 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 26, sub-code 0x2 (ethdev) ETH_FAILURE "Ethernet Failure" eth0 device self test: FAILxxxx yyyy If the ethernet controller fails its internal test, this error will be displayed. Since this is an internal test, it is likely the ethernet controller itself which has failed. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "eth test" command to repeat the test. Make sure that CBIOS initialization has past the point of PCI scan.Use Whack "loop ffff eth test" to repeat in a loop. Non-fatal error: Code 26, sub-code 0x3 (0) ETH_FAILURE "Ethernet Failure" No ethernet devices available for loopback test This error indicates that no ethernet devices could be found or initialized on the node. This is possibly the result of a hardware failure. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "eth test" command to make sure that the low level test passes. B) Try using "net dhcp" in an environment that has a DHCP server to see if the node can send and receive packets. If so, then this error is likely caused by incorrect BIOS code. Non-fatal error: Code 26, sub-code 0x4 (0) ETH_FAILURE "Ethernet Failure" No loopback connections were found. An external loopback plug is required if this node has only one ethernet port. A crossover cable is required if this node has more than a single ethernet port. Resolution: A) Make sure the ethernet loopback plug is in the ethernet connector (you should see link status lights illuminated).In the case of a node having two ethernet ports, make sure a crossover cable is connected between the ethernet ports. B) Cycle power on the node. C) Replace the node motherboard. Diagnostic: A) This problem is most likely caused by a bad connector or bad connection to the loopback plug. Make sure TX+ makes a circuit to RX+ and TX- makes a circuit to RX- on the PHY. B) Try plugging into a normal ethernet to see if it can talk to a DHCP server "net dhcp" C) Try using "net loopback" to test the ethernet port using the internal PHY loopback. Table Continued Error codes—HPE 3PAR OS 3.3.1 77 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 26, sub-code 0x5 (slotid) ETH_FAILURE "Ethernet Failure" eth2 loopback PHY internal: FAIL This error indicates that the internal loopback of the PHY did not correctly loop back packets. If the device being tested is onboard the node (82559ER or 82551ER), then this is a failure. Some plug-in PCI boards (such as 82557) do not fully support PHY loopback. Those devices will cause the following warning: eth2 loopback PHY internal: Unavailable No error stop will occur in the case of a PHY not supporting internal loopback. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the "pci probe" command to match up the ethernet devices with which one has failed. B) Try using an external loopback to see if the results are the same. If the same, then try debugging using a scope. If the external loopback works, then it may be that the PHY loopback just does not work in this device. Non-fatal error: Code 26, sub-code 0x6 (slotid) ETH_FAILURE "Ethernet Failure" eth0 sends to eth1 but cannot receive from it This is an unusual error in that one ethernet device is able to reliably receive packets from the other, but the opposite is not true. Resolution: A) Run the test again.If the nodes are attached to a hub, the failure may be due to another ethernet node flooding the network. B) Cycle power on the node. C) Ensure that there is no a switch between the ethernet ports.A switch may prevent the test from functioning properly if the MAC address of an interface is in use elsewhere or the switch is really an IP router. D) Ensure that there is no a switch between the ethernet ports.A switch may prevent the test from functioning properly if the MAC address of an interface is in use elsewhere or the switch is really an IP router. Diagnostic: A) Test against a plug-in PCI ethernet card to isolate which ethernet interface is not functioning. Table Continued 78 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 26, sub-code 0x7 (slotid) ETH_FAILURE "Ethernet Failure" eth0 loopback wwwww: FAIL - receive timeout (xx seconds) This error indicates the ethernet device did not successfully receive the loopback pattern sent to test the ethernet device's tranceiver. The failure to receive a loopback pattern usually means the ethernet device has failed. "ethdev" indicates the PCI Slot in device is located.This is an ASCII PCI slot 0. If the ethernet device motherboard, then ethdev will have The following to see. If this not happened: eth0 loopback eth0 loopback eth0 loopback eth0 loopback eth0 loopback which the failed ethernet value, so 0x30 indicates is located on the node a value of 0x00. are normal test results that you would expect error occurs, then one of the following has All zeros: PASS All ones: PASS Walking ones: PASS Walking zeros: PASS Random pattern: PASS This error indicates that within 100 packets successfully transmitted, there were no packets successfully received. Resolution: A) Cycle power on the node. B) Unplug the network cable and run the test again. If the node is attached to a hub, the failure may be due to another ethernet node flooding the network. This is not very likely. C) If the ethernet device is located in a PCI slot, replace the card. D) Replace the node motherboard Diagnostic: A) Test against a plug-in PCI ethernet card to isolate which ethernet interface is not functioning. Non-fatal error: Code 26, sub-code 0x8 (slotid) ETH_FAILURE "Ethernet Failure" eth0 loopback wwwww: Packet transmit failed This error indicates that the ethernet device was not able to successfully transmit packets.This is really a serious failure, since the ethernet code will under any condition not fail to transmit unless the ethernet device failed to initialize. Resolution: A) Use "eth reset" to reset the ethernet device. B) Cycle power on the node. C) Replace the node motherboard if the failed ethernet device is on the node. Table Continued Error codes—HPE 3PAR OS 3.3.1 79 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 26, sub-code 0x9 (slotid) Description ETH_FAILURE "Ethernet Failure" eth0 loopback wwwww: FAIL - miscompare stuck high=xxxx stuck low=yyyy toggle=zzzz This error is displayed if one of the ethernet tests detects a mismatch between the packet send and the data received. It also includes a diagnostic line which is useful to see in what way the data is different. Resolution: A) Use "eth reset" to reset the ethernet device. B) Cycle power on the node. C) Replace the node motherboard if the failed ethernet device is on the node. Diagnostic: A) You can get complete packet dumps if you wish to manually compare how the data was corrupted. In order to do this, use "net loopback vv" (double verbose). B) If it is a single bit that is failing (or a small number), observe if the bits are pulled high or low. This may assist you in debugging where the hardware is failing, if it is external to the ethernet IC. Non-fatal error: Code 26, sub-code 0xa (slotid) ETH_FAILURE "Ethernet Failure" ethxxx device registers:FAIL Onboard ethernet device did not read valid config from EEPROM. A powercycle might clear this failure if this is a new node. This error indicates the ethernet device failed to initialize properly, probably because it read invalid content from the attached EEPROM device. If this an onboard GigE on the 5000P chipset (Tx00, Fx00, Vx00, Gx00), then it is likely this is the first time the node has ever been powered on. Once the BIOS writes a configuration to the SPI EEPROM attached to the GigE, it is necessary for the board to be power cycled before the GigE device is usable. If the board is not new and you see this failure, then it's likely a component on the node motherboard has failed. Resolution: A) Power cycle the node. B) Replace the node motherboard if the failed ethernet device is on the node. Table Continued 80 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 27, sub-code 0x0 (#) Description TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" Each node board has multiple temperature and voltage sensors and fan RPM sensors which monitor the environment to ensure the temperature, voltage, and fan RPM are within operating tolerances. This directly results in increased reliability of the product. If a temperature or a voltage falls outside a programmed tolerance level, CBIOS will alert the user to this condition. The sub-code displayed reflects the type of (the first) error detected. The data value is a count of the number of temperature/voltage/fan problems detected. A sub-code value of 0x0 indicates a fan RPM problem. A sub-code value of 0x1 indicates a temperature problem. A sub-code value of 0x2 indicates a voltage problem. This particular sub-code indicates a programmed temperature limit has been exceeded. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Verify the limit settings are reasonable. Use the Whack "i2c env" command. The Whack "i2c env defaults" command resets all defaults. C) Verify both power supply fans are spinning freely and that the supply amber failure light is not illuminated.If only a single supply is installed, make sure the second slot either has a fan or is covered. D) Replace the power supply. E) If it's CPU temperature, verify the heatsink is conducting heat well. F) If it's CPU voltage, try swapping out the CPU voltage regulators. G) Replace the node motherboard. Diagnostic: A) Use a voltage probe at appropriate vias to verify correct voltage levels. B) Verify LM87 external temperature sensor line is well connected to the CPU's thermal diode. Non-fatal error: Code 27, sub-code 0x1 (#) TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates a programmed temperature limit has been exceeded. See Code 27, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 81 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 27, sub-code 0x2 (#) Description TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates a programmed voltage limit has been exceeded. See Code 27, sub-code 0x0 for resolution information. Non-fatal error: Code 27, sub-code 0x3 (0) TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates a sensor interrupt test failed. See Code 27, sub-code 0x0 for resolution information. Fatal error: Code 27, subcode 0x4 (0) TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates that a CPU has asserted its THERMTRIP_N signal. This could mean that it has reached its case temperature, that a VRM has failed, or there is a problem with the FPGA. Resolution: A) Check the environmentals. B) Replace the node. Non-fatal error: Code 27, sub-code 0x5 (Shutdown Code =1 or =2) TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Pause" For ShutdownCode = 1: In a system wide over temperature condition, the OS will shut down the system and reboot the nodes. The BIOS will pause the boot in a low power state until the over temperature condition has been cleared for 30 minutes. When in this state BIOS samples critical temperature sensors periodically and displays the current state of those on the system console every few minutes.This delay can be cleared early by a node power cycle. This log entry indicates the start of the BIOS boot pause. For ShutdownCode = 2: This shutdown code indicates an overtemperature faulure of a single node. TPD will flag this failure and shutdown that node. The node will not complete the boot until the unit has been repaired and any issues cleared. To clear the boot halt, reboot the node and use the Whack command "unset tshutdown" before the POST reaches step 35. Table Continued 82 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 27, sub-code 0x6 (0) Non-fatal error: Code 27, sub-code 0x7 (0) Non-fatal error: Code 27, sub-code 0x8 (count) Non-fatal error: Code 27, sub-code 0x9 (index) Description TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Resume" This sub-code indicates that the critical temperature sensors have been below their thresholds for at least 30 minutes and the BIOS is resuming the boot process. See sub-code 5 for more temperature shutdown information. TEMP_VOLTAGE_FAILURE"Temp Shutdown Override" This sub-code indicates that BIOS skipping a critical temperature boot pause due to a node power cycle. See sub-code 5 for more temperature shutdown information. TEMP_VOLTAGE_FAILURE"No Response" This sub-code indicates that the I2C sensor defined failed to respond on the I2C bus. 'Count' indicates the number of I2C device failures. TEMP_VOLTAGE_FAILURE"High Limit Error" This sub-code indicates that BIOS detected a mathmetical overflow of the 8-bit upper limit register and measurements on this sensor indicated by 'index' may be incorrect.The voltage or temperature limit could not be converted to and stored as an 8-bit value. Contact Engineering for a HW fix. Table Continued Error codes—HPE 3PAR OS 3.3.1 83 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x0 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Cluster Memory ASIC not found for CM Init. The Eagle/Osprey/Harrier ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. The CM exists on all PCI buses in the node. If the CM cannot be found on any of the require PCI bus, this is a serious problem.subcode 0x0 indicates the PCI bus scan did not locate the Cluster Manager. Resolution: A) Cycle power on the node. B) Pull all PCI cards and cycle power on the node. C) Replace the node motherboard. Diagnostic: A) Use "pci find 1590" at the Whack prompt to see if the CM can be located. Since the same data structure is used, it should not show up there either. Use "pci init" which will scan the PCI bus again.If the CM appears now (with "pci find 1590"), it may be a transient problem. B) Examine the output of "pci probe" to determine if other onboard PCI devices are missing. This may help to determine where the failure occurs. For example, if the four PCI bridges do not show, it may be the CIOB at fault. Table Continued 84 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x0 (1) Description CM_MEMORY_FAILURE "Cluster Memory Failure" DIMMs did not compare identical DIMM and SPD comparison Failed Not all required CMA DIMMS were found or are exact matches. Or, Example of slot 0 DIMM missing in Set2: All slot 0 DIMMs are required installed Set2 DC DIMM1.0.0 (J14007): not present or invalid SPD Not all required CMA DIMMS were found or are exact matches or Example of 1 DIMM missing in 2 MC, 1 DIMM/MC configuration: Number of DIMMs found 1 != 2 Example of 1 DIMM missing in 2 MC, 2 DIMM/MC configuration: Number of DIMMs found 3 = 2 Example of 1 DIMM missing in 4 MC, 2 DIMM/MC configuration: Number of DIMMs found 7 != 4 or 8 Not all required CMA DIMMS were found or are exact matches The Harrier2 ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. The compare SPD routine takes two DIMMs on the same memory channel (if populated as such) and checks that the first 38 bytes of the SPD are identical, as well as matches some specific DIMM requirements. sub-code 0x1 indicates that the DIMMs on one of the Memory Channels did not match the first 38 bytes of the SPDs, one or more slot 0 Memory Channel DIMMs where not or the number of DIMMs found was incorrect. Resolution: A) Install the appropriate DIMMs in the CM DIMM slots of each memory channel. Not all DIMMs have to be the same vendor, but they must be the same vendor on each memory channel. Table Continued Error codes—HPE 3PAR OS 3.3.1 85 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x0 (2) Description CM_MEMORY_FAILURE "Cluster Memory Failure" Error: Harrier2 #0 not found for CM Init. The Harrier2 ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. The CM exists on all PCI buses in the node. If the CM cannot be found on any of the require PCI bus, this is a serious problem. sub-code 0x2 indicates the PCI bus scan did not locate the Cluster Manager. Resolution: A) Cycle power on the node. B) Pull all PCI cards and cycle power on the node. C) Replace the node motherboard. Diagnostic: A) Use "pci find 1590" at the Whack prompt to see if the CM can be located. Since the same data structure is used, it should not show up there either. Use "pci init" which will scan the PCI bus again. If the CM appears now (with "pci find 1590"), it may be a transient problem. B) Examine the output of "pci probe" to determine if other onboard PCI devices are missing. This may help to determine where the failure occurs. For example, if the four PCI bridges do not show, it may be the CIOB at fault. Fatal error: Code 28, subcode 0x0 (3) CM_MEMORY_FAILURE "Cluster Memory Failure" Error: Harrier2 #1 not found for CM Init. See Code 28, sub-code 0x0 (2) for resolution information. Table Continued 86 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x0 (2) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Harrier2 #0 not found for CM Init. The Harrier2 ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. The CM exists on all PCI buses in the node. If the CM cannot be found on any of the require PCI bus, this is a serious problem.subcode 0x2 indicates the PCI bus scan did not locate the Cluster Manager. Resolution: A) Cycle power on the node. B) Pull all PCI cards and cycle power on the node. C) Replace the node motherboard. Diagnostic: A) Use "pci find 1590" at the Whack prompt to see if the CM can be located. Since the same data structure is used, it should not show up there either. Use "pci init" which will scan the PCI bus again.If the CM appears now (with "pci find 1590"), it may be a transient problem. B) Examine the output of "pci probe" to determine if other onboard PCI devices are missing. This may help to determine where the failure occurs. For example, if the four PCI bridges do not show, it may be the CIOB at fault. Fatal error: Code 28, subcode 0x0 (3) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Harrier2 #1 not found for CM Init. See Code 28, sub-code 0x0 (2) for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 87 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 28, sub-code 0x0 (xx04) Description CM_MEMORY_FAILURE "Cluster Memory Failure" DIMM 0: Unsupported Raw Card Type in SPD byte 62 = xx, Using rdimm_control_words[0][]. Where xx, is the hex value that was read from DIMM0 SPD Byte 62. Byte 62 of the DIMM SPD indicates which JEDEC reference design raw card was used as the basis for the module assembly, if any. Bits 4 ~ 0 describe the raw card and bits 6 ~ 5 describe the revision level of that raw card. Special reference raw card indicator, 1F, is used when no JEDEC standard raw card reference design was used as the basis for the module design. Preproduction modules should be encoded as revision 0 in bits 6 ~ 5. The reference card is looked up in rdimm_control_words to determine the index into the rdimm_control_words table. If the value in Byte 62 is not found in the table this error reported. Resolution: A) Replace DIMM with a supported Raw Card Type. Non-fatal error: Code 28, sub-code 0x0 (xx05) CM_MEMORY_FAILURE "Cluster Memory Failure" DIMM 1: Unsupported Raw Card Type in SPD byte 62 = xx, Using rdimm_control_words[0][]. Where xx, is the hex value that was read from DIMM1 SPD Byte 62. Resolution: A) Replace DIMM with a supported Raw Card Type. Table Continued 88 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 28, sub-code 0x0 (xxx06) CM_MEMORY_FAILURE "Cluster Memory Failure" DIMM Requirement failure at Offset %d (xxx): yy != zz Where: %d is the failing offset in decimal xxx, is the failing offset in hex yy is the value from DIMM0.0.0 zz is the value from the DIMM being evaluated The Harrier2 ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. sub-code xxx06 indicates that one of the requirements of the evaluated DIMM did not match DIMM0.0.0. The current h2_dimm_check_list requirements are: -Offset 2, DRAM Type must be DDR3 for all DIMMs (SPD Byte 2), -Offset 3, DIMM Module type, must be RDIMM or LRDIMM for all DIMMs (SPD Byte 3), -Offset 256, DIMM size in MB (lo) must be the same for all DIMMs, -Offset 257, DIMM size in MB (hi) must be the same for all DIMMs, The DIMM size is calculated from various SPD bytes and stored in Bytes 256/257 of the SPD structure. The combined size is in MB. Example of two different size DIMMs (16BG and 8BG) installed: DIMM Requirement failure at Offset 257 (0x101): 40 != 20 Resolution: A) Install all CM DIMMs that adhere to the above requirements Table Continued Error codes—HPE 3PAR OS 3.3.1 89 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 28, sub-code 0x1 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" Pairwwww DIMMxxxx: Bad checksum. Got yyyy, SPD said zzzz The memory DIMMs located on the CM riser are called cluster memory. This memory is used to store data destined for the disks (dirty data) as well as data previously read from the disks (cache data). It is also used for communication among the nodes in the cluster. This memory is not required to boot the operating system, but is required for the node to participate in the cluster. Even before the memory is thoroughly tested for proper operation, it must be configured to appear in CM addressable space.Each memory DIMM has a small embedded serial EEPROM which holds DIMM configuration information such as the number of rows, columns, and banks, as well as memory timing.If this serial EEPROM becomes corrupt, data stored in it regarding the DIMM configuration cannot be trusted. So, this EEPROM also contains a checksum which the BIOS verifies is correct before configuring the DIMM. If this checksum does not match the checksum the BIOS computes across the DIMM, this error will result. You should look at prior output to determine if there were I2C errors. These errors suggest a problem with riser installation. The DIMM number is logged in the Data field of the Fatal Error. Resolution: A) Reseat Cluster Memory riser card(s). B) Reseat Cluster Memory DIMMs. C) Replace Cluster Memory DIMMs in pairs to ensure replacement parts are matched. P4-Eagle and PIII-Eagle DIMM Pairs are always located four riser positions apart. For example, if you number the slots from the top, Pair 0 is at positions 3 and position 7 (top). Pair 1 is at positions 0 (bottom) and position 4. Pair 2 is at positions 2 and position 6. Pair 3 is at positions 1 and position 5. Ironman (Tclass) and Tinman (Fclass) sets are always in sets of three. The DIMMs are set as "DIMM C.S" as in Channel then set. There are two riser cards, one for channel 0 and one for channel 1 and 2. Set 0 is DIMM 0.0, 1.0, 2.0 Set 1 is DIMM 0.1, 1.1, 2.1 Set 2 is DIMM 0.2, 1.2, 2.2 Titan and Atlas have 4 DIMM sets on the motherboard. Set 0: DIMM 0.0 and 1.0 Set 1: DIMM 0.1 and 1.1 Set 2: DIMM 2.0 and 3.0 Set 3: DIMM 2.1 and 3.1 D) Replace the Cluster memory riser(s). E) Replace the node motherboard. Table Continued 90 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Diagnostic: A) The Cluster Memory DIMMs appear on the I2C bus at 2.a0 through 2.ae. Use the Whack "d i2c" command to display the DIMM serial EEPROM contents to determine if there is a pattern. Example (DIMM 5): Whack> d i2c 2.aa.0 Fatal error: Code 28, subcode 0x2 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Pairww DIMMxx (yyyy): 'zzzz' read failed Where xxxx is one of: row address, column address, module rows, cas latency3, refresh, banks, cas latency2, cas latency1, ras precharge, act_to_rw, act_to_deact, ras cycle, write_to_deact, density, frequency, DIMM type This error indicates that a Cluster Memory DIMM was detected but that the Serial EEPROM present on the DIMM could not be reliably read. The DIMM number is logged in the Data field of the Fatal Error. See Code 28, sub-code 0x1 for resolution information. Non-fatal error: Code 28, sub-code 0x3 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Pairqq DIMMtt (uuuu): vv != DIMMww (xxxx): yy zzzz This error indicates the BIOS detected the SDRAM DIMMs in the cluster memory bank pair are of a different type. One DIMM number of the mismatched pair will be logged in the data field of the Fatal Error. Resolution: A) Ensure both DIMMs in the pair are identical. Note that two DIMMs may have the same capacity but have different number of rows, columns, or banks. The DIMM configuration must exactly match. If the DIMMs have similar markings and capacity, they are probably identical. Diagnostic: A) The Serial EEPROM information in each pair of DIMMs should be identical or nearly identical. See Code 28, sub-code 0x1 for more resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 91 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x4 (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Pairww DIMM xx (yyyy): di_module_rows is not 1 or 2! zzzz This error indicates the Cluster Memory DIMMs reported an odd (and unsupported) number of rows. Usually the number of rows reported by a DIMM corresponds to the number of sides of the DIMM which are populated by memory. One DIMM number of the failing pair will be logged in the Data field of the Fatal Error. See Code 28, sub-code 0x3 for resolution information. Fatal error: Code 28, subcode 0x5 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" No Cluster Memory Installed This error indicates that no memory was found in the Cluster memory riser. Since cluster memory is needed for proper node operation within the cluster, this is a condition which must be resolved for proper operation. You should look at prior output to determine if there were I2C errors. These errors suggest a problem with riser or DIMM installation. See Code 28, sub-code 0x1 for resolution information. Fatal error: Code 28, subcode 0x6 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Pairww DIMM xx (yyy): RAS cycle time > 10. got zzz/10 We This error indicates the Serial EEPROM on the DIMM reports a value which is outside tolerance for the memory controller. One DIMM number of the failing pair will be logged in the Data field of the Fatal Error. See Code 28, sub-code 0x1 for resolution information. Table Continued 92 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x7 (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Cluster Memory not responding. DIMM uuu (vvv): Expected = (xxxx) Actual = (yyyy) Addr (zzzz) *** Error: Cluster Memory FAILURE - too many mismatches. Before ECC initialization of Cluster memory (scrub), a small region must be tested and configured by the CPU to set up the ECC scrub of the remainder. If an error occurs during this test (such as memory read does not match the value just written), then this error will be reported. The DIMM number is logged in the Data field of the Fatal Error. Diagnostic: A) Compare the expected pattern such as a bit stuck high Example (bit 31 stuck low): Expected = (0xf1f1f1e5) Actual = Expected = (0x92929285) Actual = Expected = (0xb3b3b3a5) Actual = Expected = (0xd3d3d3c5) Actual = and actual values for a or stuck low. (0x71f1f1e5) (0x12929285) (0x33b3b3a5) (0x53d3d3c5) See Code 28, sub-code 0x1 for resolution information. Fatal error: Code 28, subcode 0x8 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Found errors during scrub. Eagle Error Status: xxxx *** Error: Found errors during scrub. Osprey Error Status: xxxx During the ECC initialization of Cluster memory, The Cluster Manager records and memory errors it encounters. If any were recorded, this error will be displayed. See Code 28, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 93 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x9 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: CM DIMM programmed address > top of memory For each Cluster memory DIMM, there is a register in the Eagle / Osprey memory controller which specifies where the DIMM maps into CM physical memory. These mapping registers are configured during the Cluster memory probe and should not change under normal circumstances. Since this is an internal CM register, it is unlikely that reseating memory will correct this problem. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus.The base address in PCI space for the configuration and status registers (CSRs) is Window 0.Example: Whack> pci find 1590 Win Baseaddr Basesize Identity [0] 00:90200000 00:00000400 3PAR (ASIC) LPC# [1] 00:20000000 00:20000000 [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x90200000 above). This is the base address of the CM Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information as register programming. Table Continued 94 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0xa (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz) *** Error: Uncorrectable ECC The Cluster memory controller detected an Uncorrectable ECC error. Eagle / Osprey identifies the failing bank and address with the error as well as the error syndrome. The BIOS will convert the information into the failing DIMM and Riser Slot numbers. There may be multiple Uncorrectable errors. In this case, the CM will save the address/syndrome for the most recent error. The DIMM number is logged in the Data field of the Fatal Error. Eagle nodes (S-Series and E-Series): There are 8 DIMMs maximum on the S-Series Cluster Memory Riser Card. If the DIMM number is not between 0-7 (inclusive), then the failing DIMM cannot be identified. Osprey nodes (T-Series and F-Series): There are 6 DIMMs on T-Series and 3 DIMMs on F-Series. The data field encodes which DIMM encodes the DIMM number in the lower 4 bits of the field and the channel number in the upper 4 bits. So a data value of 12 indicates DIMM 1.2 is at fault. Harrier nodes (V-Series, Atlas, Minime1 & 2): There are 8 DIMMs on V-Series between two different Harrier ASICs; two memory controllers with 2 DIMMs each. The data field encodes which memory channel encountered the uncorrectable error. A data value of 10 means channel one ia at fault, a value of 0 means channel zero is at fault. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM(s). D) Replace the failing Cluster Memory DIMM(s). E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus. The base address in PCI space for the configuration and status registers (CSRs) is Window 0. Example: Whack> pci find 1590 ... Win Baseaddr Basesize Identity ... [0] 00:60200000 00:00000400 3PAR Eagle ... [1] 00:20000000 00:20000000 ... [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x60200000 above). This is the base address of the CM Memory Control Table Continued Error codes—HPE 3PAR OS 3.3.1 95 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Register Block.Refer to the Scaffold System Architecture Reference for information as register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The CM Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Table Continued 96 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0xb (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz) *** Error: Correctable ECC The Cluster memory controller detected a correctable ECC error. The CM identifies the failing bank and address with the error as well as the error syndrome. The BIOS will convert the information into the failing DIMM and Riser Slot numbers. The DIMM number is logged in the Data field of the Fatal Error. Eagle nodes (S-Series and E-Series): There are 8 DIMMs maximum on the Cluster Memory Riser Card. If the DIMM number is not between 0-7 (inclusive), then the failing DIMM cannot be identified. Osprey nodes (T-Series and F-Series): There are 6 DIMMs on T-Series and 3 DIMMs on F-Series. The data field encodes which DIMM encodes the DIMM number in the lower 4 bits of the field and the channel number in the upper 4 bits. So a data value of 12 indicates DIMM 2.1 is at fault. Harrier nodes (V-Series, Atlas, Minime1 & 2): This should not occur on Harrier. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM. D) Replace the failing Cluster Memory DIMM. E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus. The base address in PCI space for the configuration and status registers (CSRs) is Window 0.Example: Whack> pci find 1590 Win Baseaddr Basesize Identity [0] 00:60200000 00:00000400 3PAR Eagle [1] 00:20000000 00:20000000 [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x60200000 above). This is the base address of the CM Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information on register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The CM Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then Table Continued Error codes—HPE 3PAR OS 3.3.1 97 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Table Continued 98 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0xc (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Addr (zzzzzzzz) Wrote (wwwwwwww) Read (yyyyyyyy) or *** Error: Data Miscompare in Final Block offset zzzzzzzz *** Error: Expected (wwwwwwww) Actual (yyyyyyyy) or *** Error: CM DIMM5 (Jxxxx): Address (uu:uuuuuuuu) CM DECODE TEST miscompare at (1) (vvvvvvvvvvvvvvvv) Expected: (wwwwwwww) Actual: (yyyyyyyy) Offset: (zzzzzzzz) or similar to above The CBIOS runs Cluster Memory Tests as part of POST in both normal operation and manufacturing test. If any test fails due to a data miscompare, the test will generate this fatal error code with sub-code '0xc'. CBIOS runs the following tests: Walking 1/0 across data Walking 1/0 across address (512 MB Small Memory Window) Walking 1/0 using XCB (64 bytes) across segment boundaries Any test failure will result in a fatal error. The DIMM number is logged in the Data field of the Fatal Error. Eagle nodes (S-Series and E-Series): There are 8 DIMMs maximum on the Cluster Memory Riser Card. If the DIMM number is not between 0-7 (inclusive), then the failing DIMM cannot be identified. Osprey nodes (T-Series and F-Series): There are 6 DIMMs on T-Series and 3 DIMMs on F-Series. The data field encodes which DIMM encodes the DIMM number in the lower 4 bits of the field and the channel number in the upper 4 bits. So a data value of 12 indicates DIMM 2.1 is at fault. Harrier nodes (V-Series, Atlas, Minime1 & 2): This should not occur in Harrier. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM. D) Replace the failing Cluster Memory DIMM. E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus. The base address in PCI space for the configuration and status registers (CSRs) is Window 0.Example: Whack> pci find 1590 Win Baseaddr Basesize Identity [0] 00:60200000 00:00000400 3PAR Eagle [1] 00:20000000 00:20000000 Table Continued Error codes—HPE 3PAR OS 3.3.1 99 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x60200000 above). This is the base address of the CM Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information on register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The CM Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Fatal error: Code 28, subcode 0xd (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Pairwwww DIMMxxxx: Illegal SPD value <name of value> <value> This error indicates that a Cluster Memory DIMM was detected but that the Serial EEPROM present on the DIMM reported an illegal or unsupported value for our memory controller. The DIMM number is logged in the Data field of the Fatal Error. Example: Density (SPD byte 31) has more than 1 bit set (ie. 0x30) which indicates a non-standard part. See Code 28, sub-code 0x1 for resolution information. Most likely, the DIMM is not qualified for use in our Node Board. Fatal error: Code 28, subcode 0xe (mm) CM_MEMORY_FAILURE "Cluster Memory Failure" If there was a problem mapping the CM Small Cluster memory window into CPU 32-bit space, this error may result when attempting to initialize Cluster memory. The initialization problem could be due either to hardware failure or by setting a special NVRAM variable that eliminates the address space normally reserved for CM memory windows. An example of such is setting "mem_max" to a value above 2496. Another example would be setting "pci_base" above 0xa0000000. Resolution: Contact 3PAR technical support. Table Continued 100 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0xf (mm) Description CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Bank (xx) CM DIMMyy (Jzzzz) *** Error: CM DIMMs with ECC errors The Cluster memory controller detected a memory error in a specific DIMM bank. The CM memory error status register is logged in the Data field of the Fatal Error. See Code 28, sub-code 0xb for resolution information. Fatal error: Code 28, subcode 0x10 (mm) CM_MEMORY_FAILURE H1 LPC0 HW ERR ST H1 LPC0 ERR Stat H1 LPC0 ERR ID "CMA Failure" [00000004]: dataq_parity [00000006]: EP-Error-Rpt Fatal-Error [80000000]: HW-Err The Cluster memory controller detected a hardware error. This error is printed, as shown above. mm is decoded as bits 31-28 represent the LPC number and bits 27-0 are the error bits as set in the hardware error status register.The hardware error means that the Harrier ASIC is non functional. Resolution: A) Cycle power on the node. B) Replace the node. Fatal error: Code 28, subcode 0x20 (mm) CM_MEMORY_FAILURE "Cluster Memory Failure" Testing CM data lines with walking 1 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 bits test verifies that the processor may directly access CM cluster memory by performing a walking 1's test on all data lines. If any fails, this error will result. The data value (mm) could be in the form 0x00XXYYZZ there XX is the DIMM number (0-11), YY is the return code (RC_??), and the ZZ valeu is the number of errors found. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat Cluster Memory DIMMs. D) Replace the node motherboard. Diagnostic: A) Use the Whack command line to attempt to access CM memory manually to determine if data line bits are stuck. Table Continued Error codes—HPE 3PAR OS 3.3.1 101 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x21 (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM data lines with walking 0 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all data lines. If any fails, this error will result. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x22 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" ZERO CM problem at addr xxxx Between PCI bus tests, a small portion of cluster memory is cleared. If errors in clearing the memory are detected, this error will result. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x23 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM address lines with walking 1 (first 512 MB only) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 address bits test verifies that the processor may directly access cluster memory by performing a walking 1's test on all address lines.If any fails, this error will result. See Code 28, sub-code 0x20 for resolution information. Table Continued 102 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x24 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM address lines with walking 0 (first 512 MB only) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 address bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all address lines. If any fails, this error will result. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x25 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM segment decode boundaries This test verifies that memory decoding at all CM DIMM pairs is working correctly.It does so by writing a unique 128 bytes at each memory decode boundary location. It then verifies the values were written correctly and looks for corruption of other addresses. See Code 28, sub-code 0x20 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 103 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x26 (eecd) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM with random XOR (all Cluster Memory) ee = number of errors in XOR errors. c = Channel Number where the error took place. d = DIMM number where the error took place. HW error during or HW error during or *** Error: Data Expected (yyyy) XCB transfer CM -> CM XCB transfer CM -> PCI 1 (xxxx) Miscompare in Final Block offset xxxx Actual (zzzz) This function performs a random data test on all cluster memory attached to the CM to verify memory under stress with random patterns. This test also exercises the CM XOR engine as several sources are used simultaneously throughout the cluster memory test. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x27 (0) CM_MEMORY_FAILURE (<DIMM>)"DQS Training Failed" This error occurs when the DQS training fails to find working values for the DQS enable, DQS out skew, and DQS in skew. See Code 28, sub-code 0x20 for resolution information. *** Fatal error: Code 28, sub-code 0x30 (mm). CM_MEMORY_FAILURE "Cluster Memory Failure" Testing CM ECC lines with walking 1 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 bits test verifies that the processor may directly access CM cluster memory by performing a walking 1's test on all ECC lines.If any fails, this error will result. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat Cluster Memory DIMMs. D) Replace the node motherboard. Diagnostic: A) Use the Whack command line to attempt to access CM memory manually to determine if data line bits are stuck. Table Continued 104 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x31 (mm) Description CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM ECC lines with walking 0 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all ECC lines.If any fails, this error will result. See Code 28, sub-code 0x30 for resolution information. Fatal error: Code 28, subcode 0x32 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM Op Codes The CM Op Code test verifies that the processor may execute one of the available operations for this cluster manager ASIC. This error means that a particular opcode is not supported. If any op code fails, this error will result. Resolution: A) Replace the node motherboard. Fatal error: Code 28, subcode 0x33 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM Source Interrupts The CM Source generated for companion CMA systems with only one Interrupts test will test that an interrupt is each CMA data path, from processor, CMA, or to either processor memory to local CMA.On CMA, the companion tests are not done. Resolution: A) Replace the node motherboard. Fatal error: Code 28, subcode 0x34 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM I2C communication test The CM I2C communication test will read and write to various safe CMA registers or CMA memory and verify that the expected values are read. A fail means either a bad DIMM or bad CMA. See Code 28, sub-code 0x30 for resolution information. Fatal error: Code 28, subcode 0x35 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Stopped on an Uncorrectable Error The scan for errors found an uncorrectable error in one of the CMAs. The system stopped during a BIOS test when this error was discovered. See Code 28, sub-code 0x30 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 105 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x36 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Stopped on a Correctable Error The scan for errors found a correctable error in one of the CMAs. The system stopped during a BIOS test when this error was discovered. See Code 28, sub-code 0x30 for resolution information. Fatal error: Code 28, subcode 0x40 (mm) CM_MEMORY_FAILURE "Cluster Memory Failure" Testing CM MMW data lines with walking 1 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 bits test verifies that the processor may directly access CM cluster memory by performing a walking 1's test on all data lines. This test uses the Medium Memory Window (MMW). If any fails, this error will result. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat Cluster Memory DIMMs. D) Replace the node motherboard. Diagnostic: A) Use the Whack command line to attempt to access CM memory manually to determine if data line bits are stuck. Fatal error: Code 28, subcode 0x41 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM MMW data lines with walking 0 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all data lines. This test uses the Medium Memory Window (MMW). If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Fatal error: Code 28, subcode 0x42 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" ZERO CM problem at addr xxxx Between PCI bus MMW tests, a small portion of cluster memory is cleared. If errors in clearing the memory are detected, this error will result. See Code 28, sub-code 0x40 for resolution information. Table Continued 106 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x43 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 1 (MMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 address bits test verifies that the processor may directly access cluster memory by performing a walking 1's test on all address lines using the medium memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Fatal error: Code 28, subcode 0x44 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 0 (MMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 address bits test verifies that the processor may directly access cluster memory by performing a walking 0's test test on all address lines using the medium memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 107 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 28, subcode 0x45 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 1 (RMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 address bits test verifies that the processor may directly access cluster memory by performing a walking 1's test on all address lines using the remote memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Fatal error: Code 28, subcode 0x46 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 0 (RMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 address bits test verifies that the processor may directly access cluster memory by performing a walking 0's test test on all address lines using the remote memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Table Continued 108 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 28, subcode 0x47 (wwxxyyzz) Description CM_MEMORY_FAILURE"Cluster Memory Failure" *** Error: MTB Granularity error! SPD Byte 10, expected: ww, actual: yy SPD Byte 11, expected: xx, actual: zz All the MTB (Medium TimeBase) calculations in the software leveling code are based on an MTB granularity of 0.125ns (SPD Byte 10=0x01 and Byte 11=0x08). These bytes define a value in nanoseconds that represents the fundamental timebase for medium grain timing calculations. This value is typically the greatest common divisor for the range of clock frequencies (clock periods) supported by a particular SDRAM. This value is used as a multiplier for formulating subsequent timing parameters. The medium timebase (MTB) is defined as the medium timebase dividend (byte 10) divided by the medium timebase divisor (byte 11). Resolution: A) Replace CM DIMM. Fatal error: Code 29, subcode 0x0 (data) CM_LINK_FAILURE "Cluster Link Failure" Link 0 did not come up (0xac000000) error = (0x002022ff) (data = link number) CM Links are high speed connections between all of the node boards in a cluster via the center panel. During Manufacturing test, nodes are connected to a special Manufacturing Center panel that connects the link transmitter to its own receivers (external loopback). When the node senses that it is in this special Center Panel, it will initialize all of the links and perform loopback tests. If any link fails to initialize, this sub-code will be reported. Resolution: A) Cycle power on the node. B) Verify that the node is securely mated with the Center Panel. C) Turn off power, re-seat the node into the center panel, and turn power back on. D) Replace the node motherboard. Diagnostic: A) Use the Whack "eagle link" commands to run more diagnostic tests on the links. The CM requires both the PCI scan has completed and Cluster Memory present and initialized. Table Continued Error codes—HPE 3PAR OS 3.3.1 109 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 29, subcode 0x1 (data) CM_LINK_FAILURE "Cluster Link Failure" CM Link Initialization failed (data = LLRR) where LL is the link bit pattern. 01 is link 0, 02 is link 1, 04 is link 2, and 08 is link 3. RR is the failure reason. E4 is Hardware error, F0 is user abort. CM Links are high speed connections between all of the node boards via the center panel. During Manufacturing test, nodes are connected to a special Manufacturing Center panel that connects each link's transmitter to its own receiver (external loopback).When the node senses that it is in this special Center Panel, it will initialize the links and run a special test to verify the operation of the transmitter/receivers of each link. If any link fails, the test will report this sub-code. See Code 29, sub-code 0x0 for resolution information. Fatal error: Code 29, subcode 0x2 (data) CM_LINK_FAILURE "Cluster Link Failure" CM# Link XOR test: Link [0]..[FAIL] (1) (data = the link bit pattern. bit 0 is link 0, bit 1 is link 1, bit 2 is link 2, and bit 3 is link 3. CM Links are high speed connections between all of the node boards via the center panel. During Manufacturing test, nodes are connected to a special Manufacturing Center panel that connects each link's transmitter to its own receiver (external loopback).When the node senses that it is in this special Center Panel, it will initialize the links and run a special test to verify the operation of the transmitter/receivers of each link. If any link fails, the test will report this sub-code. See Code 29, sub-code 0x0 for resolution information. Table Continued 110 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 29, subcode 0x3 (data) CM_LINK_FAILURE "Cluster Link Failure" CM# Link INT??? test: Link [0]..[FAIL] (1) (data = the link bit pattern. bit 0 is link 0, bit 1 is link 1, bit 2 is link 2, and bit 3 is link 3. The CM Link INT test verifies that setting either of the two interrupt flags (DEST, SRC) in the XCB does actually generate and interrupt to the processor. See Code 29, sub-code 0x0 for resolution information. Fatal error: Code 29, subcode 0x4 (data) CM_LINK_FAILURE "Cluster Link Failure" *** Error RTT Link 1 XCB ASync failed (Send) (data = link number) The CM Link Round Trip Test failed due to an XCB failure. CM XCB failed during link DMA.Use the "eagle status" command for more information on the type of error.This test checks the CM link status at multiple times during the test. The "(Send)" part of the message indicates which stage failed. Another possible values is "(Receive)". See Code 29, sub-code 0x0 for resolution information. Fatal error: Code 29, subcode 0x5 (data) CM_LINK_FAILURE "Cluster Link Failure" *** Error RTT (Receive) Link 1 Length = 0 or *** Error RTT Offset = xxxxx Expected = yyyyy Returned = zzzzz or *** Error RTT (Return) Link 1 Length mismatch or *** Error RTT (Return) Link 1 Timestamp mismatch. (data = link number) The CM Link Round Trip Test failed due to data miscompare. All packets have a length check and timestamp check. Payload compare is optional. Use the "eagle status" command to check for Uncorrectable ECC errors. See Code 29, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 111 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 29, subcode 0x6 (data) CM_LINK_FAILURE "Cluster Link Failure" *** Error RTT (Return) Timeout waiting for packet Link 1 (data = link number) The CM Link Round A packet was sent period. The Round a remote node.Use Uncorrectable ECC Trip Test failed due to packet timeout. and not received in a reasonable timeout Trip Test may not have been started on the "eagle status" to check for errors. Resolution: A) Start CM Link Round Trip Test on remote node. B) Cycle power on the node. C) Verify that the node is securely mated with the Center Panel. D) Turn off power, re-seat the node into the center panel, and turn power back on. E) Replace the node motherboard. Diagnostic: A) Use the Whack "eagle link" commands to run more diagnostic tests on the links. The CM requires both the PCI scan has completed and Cluster Memory present and initialized. Fatal error: Code 29, subcode 0x10 (0) CM_LINK_FAILURE "Cluster Link Failure" REC_EN went low. Test failed for link [x](yyyyyyyy) The "cma link init" command is used to initialize and bring up the CM links to nodes which indicate a "Power Ok" state. If this error occurs, it is possible the remote node was transmitting BIST, but then later stopped (such as from a reset or power off). Resolution: A) Perform the same test again. B) Replace the node motherboard. Diagnostic: A) Verify CM link may be brought up manually using the "eagle link set" command. Table Continued 112 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 29, subcode 0x11 (0) CM_LINK_FAILURE "Cluster Link Failure" *** Error CM linkxx producer / consumer mismatch The CM has XCB engines which transfer data. Software manages the producer register and the CM hardware follows with the consumer register. If these two do not agree and CM should be idle, then it's possible the CM has halted due to failure of some operation. This problem is likely caused by a cluster memory or link failure. Resolution: A) B) Replace the C) Replace the Diagnostic: A) Fatal error: Code 30, subcode 0x0 (0) Cycle power on the node. node motherboard. link partner node. Replace Eagle/Osprey/Harrier ASIC. SERIAL_PORT_FAILURE "Serial Port Failure" *** Error: No Oxford serial chip xx found or *** Error: No Exar serial chip found The Exar and Oxford serial chips are used for a secondary low speed link which directly connects all nodes in the cluster. They are primarily in the event of a link failure to verify whether another node in the cluster has actually gone down.Since the part is integrated onto the motherboard and is on a PCI bus, a failure to locate the internal serial chips may indicate other PCI problems as well. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "pci probe" command to show all devices on the PCI bus.Look for the two Oxford device entries, or a single Exar device entry (Pentium 4 node).If they are not there, verify other board level components are present in the list in order to isolate the component failure on the board. B) Note that a failure of a single Oxford chip may be the cause of this behavior as one bridges to the PCI bus for both. Table Continued Error codes—HPE 3PAR OS 3.3.1 113 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 30, subcode 0x1 (0) Description SERIAL_PORT_FAILURE "Serial Port Failure" *** Error: Serial Port Mfg Test failed Port (3) [FAIL] When the Node board is inserted into a Manufacturing Test Centerpanel, the internal Serial Port Manufacturing test will automatically run. This error indicates failures on all ports tested. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "pci probe" command to show all devices on the PCI bus.Look for the two Oxford device entries or a single Exar device entry (Pentium 4 node).If they are not there, verify other board level components are present in the list in order to isolate the component failure on the board. B) Note that a failure of a single Oxford chip may be the cause of this behavior as one bridges to the PCI bus for both. C) Whack provides internal serial Serial Port commands for further analysis. Fatal error: Code 30, subcode 0x2 (0) SERIAL_PORT_FAILURE "Serial Port Failure" Port (4):Processed 109 bytes[FAIL] All cluster internal serial ports go through a quick internal loopback test immediately after initialization to do a short test of proper operation. This test will run regardless of the type of centerplane in which the node is connected. This error indicates failures on all ports tested. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "pci probe" command to show all devices on the PCI bus.Look for the two Oxford device entries or a single Exar device entry (Pentium 4 node).If they are not there, verify other board level components are present in the list in order to isolate the component failure on the board. B) Note that a failure of a single Oxford chip may be the cause of this behavior as one bridges to the PCI bus for both. C) Whack provides internal serial Serial Port commands for further analysis. Table Continued 114 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 30, subcode 0x3 (0) SERIAL_PORT_FAILURE "Serial Port Failure" Internal UART is not functioning properly. Most likely this is due to a hardware failure related to the SuperIO. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Non-fatal error: Code 31, sub-code 0x0 (0) GPIO_TEST_FAILURE "GPIO Failure" FAIL (high) Port (6) Bit (4) wrote 0(0x1) Port (7) Bit (4) read 1, expected 0(0x3) The Vitesse VSC055 2 Wire Backplane Controller chip controls interfaces to the Centerplane, LEDs, Power Supplies, Nickel battery, and PCI slots. It is connected to the I2C bus. In normal 2, 4, or 8 node centerplanes, the chip will get its ports initialized as inputs or outputs and start monitoring peripheral systems. No tests available. When connected to a Manufacturing Centerplane, it will have selected pins routed to other pins for loopback testing. See the Manufacturing Centerplane Specification for details. During this test, proper VSC operation will be confirmed. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Whack "i2c vsc" commands can be used to peek and poke the VSC055 chip when in a Manufacturing Centerplane. In normal Centerplanes, these pins will be connected to other components and should not be modified. Table Continued Error codes—HPE 3PAR OS 3.3.1 115 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 31, sub-code 0x1 (0) GPIO_TEST_FAILURE "GPIO Failure" Failed I2C VSC055 1.ce.yy write zzzz During initialization, the VSC055 registers are programmed for proper system operation. This is done over the I2C bus. If an I2C operation fails during VSC055 initialization, this error will result. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Whack "i2c vsc" commands can be used to peek and poke the VSC055 chip. One failure seen in the past with the VSC055 is that sometimes a specific chip could not handle the first write access to the command register which causes a soft reset. It was determined the part violated the I2C protocol in ACKing the transaction before the I2C write operation completed. Fatal error: Code 31, subcode 0x2 (0) GPIO_TEST_FAILURE "GPIO Failure" FPGA Scratchpad registers failed meaning bad FPGA hardware. Resolution: A) Cycle power on the node. B) Replace the node. Non-fatal error: Code 31, sub-code 0x3 (0) GPIO_TEST_FAILURE "GPIO Failure" FPGA Interrupt Test failed. Resolution: A) Cycle power on the node. B) Replace the node. Non-fatal error: Code 31, sub-code 0x4 (0) GPIO_TEST_FAILURE "GPIO Failure" NEMOE Loopback Test failed. Resolution: A) Cycle power on the node. B) Replace the node. Non-fatal error: Code 31, sub-code 0x5 (0) GPIO_TEST_FAILURE "GPIO Failure" During the "Board GPIO Test", the FPGA ID is not what it expects it to be. Resolution: A) Cycle power on the node. B) Replace the node. Table Continued 116 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 31, sub-code 0x6 (0) Description GPIO_TEST_FAILURE "GPIO Failure" During the "Board GPIO Test", the FPGA Revision is not what it expects it to be. Resolution: A) Cycle power on the node. B) Replace the node. Table Continued Error codes—HPE 3PAR OS 3.3.1 117 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 31, sub-code 0x7 (0) Description GPIO_TEST_FAILURE "GPIO Failure" Titan specific. During the "Manufacturing Centerpanel GPIO Test", one or more tests have failed depending upon the output. A)If failed during 'Testing Expanders (o/p) <--> FPGA (i/p) connections:' For example, FAIL (low) Port (76) Bit (1) wrote(0x00) Port (302) Bit (4) read 0xff, expected(0xef) 1) Program I2C expander by following command: Whack> cb i2c 9.76.3 0 Here "76" is reported port number. "3" is config register offset for the expander. "0" makes all expander bits as output. 2) Set the bit in I2C expander. Whack> cb i2c 9.76.1 2 Here "1" is rdwr register offset for the expander. "2" is reported bit 1 (1 << "1") in expander. 3) Read a byte from FPGA offset. Whack> db fpga 302 1 Here "302" is reported FPGA offset. Confirm if the bit "4" in read value is set. Repeat step 2) and 3) by writing 0 to I2C expander 9.76.1 and checking if the bit "4" in FPGA offset 0x302 is cleared. B)If failed during 'Testing FPGA (o/p) <--> Expanders (i/p) connections:' For example, FAIL (low) Port (305) Bit (4) wrote(0x00) Port (7e) Bit (7) read 0x86, expected(0x06) 1) Program I2C expander by following command: Whack> cb i2c 9.7e.3 ff Here "7e" is reported port number. "3" is config register offset for the expander. "ff" makes all expander bits as input. 2) Write a byte to FPGA offset. Whack> db fpga 305 10 Here "305" is reported FPGA offset. Writing 0x10 will set the bit "4" in that offset. 3) Read a byte from I2C expander. Whack> db i2c 9.7e.0 1 Here "7e" is reported port number. "0" is read register offset for the expander. Confirm if the bit "7" in read value is set. Repeat step 2) and 3) by writing 0 to the FPGA offset and checking if the bit "7" in I2C Expander 9.7e.0 is cleared. C)For all other failure cases refer to Section # 18.2 "Manufacturing Centerplane GPIO Test Diagnostics" of Table Continued 118 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description CBIOS user guide at http://engweb/twiki/bin/view/Main/TitanMfgCpFpgaGpioTestDiag Fatal error: Code 32, subcode 0x1 (chunk) CM_XOR_FAILURE "CM XOR Failure" Xor Engine Status: P0_XERR Error Status : XOR_ERR PCI0 Error Status: PCI1 Error Status: The Eagle, Osprey, and Harrier ASICs contain a DMA engine capable of XOR operations.This DMA engine is commonly referred to as the XCB engine.The XCB engine can DMA data between 14 different modules within the ASIC, each module capable of sinking or sourcing data. The XCB engine will stop all DMA if it encounters an error while transferring data.The XCB error status indicates the module that produced the error. Further details of the error can be gathered by inspecting the error registers of that module. Use the whack command "cma status all" to get further diagnostic information. If the user continues past this error, software will attempt to reset the error and continue. Sub-code 0x1 is specific to Osprey and indicates an uncorrectable ECC error following an attempt to zero all of cluster memory. The "chunk" value indicates the chunk where the ECC error occurred. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Whack "cma status all" command displays the status registers for each CM module. Refer to the module that produced the error for further information and diagnostic procedure. Fatal error: Code 32, subcode 0x2 (chunk) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Osprey and indicates an uncorrectable ECC error following an attempt to ECC scrub all of cluster memory. The "chunk" value indicates the chunk where the scrub error occurred. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Table Continued Error codes—HPE 3PAR OS 3.3.1 119 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 32, subcode 0x6 (chunk) Description CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates an uncorrectable ECC error following an attempt to zero all of cluster memory. The "chunk" value indicates the chunk where the ECC error occurred. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Fatal error: Code 32, subcode 0x7 (err_last) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates a general Harrier DMA error following an attempt to zero all of cluster memory. The "err_last" value represents the normalized content of the Harrier mem_common->mem_err_status register. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Fatal error: Code 32, subcode 0x8 (chunk) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates an uncorrectable ECC error following an attempt to ECC scrub all of cluster memory. The "chunk" value indicates the chunk where the scrub error occurred. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Fatal error: Code 32, subcode 0x9 (err_last) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates a general Harrier DMA error following an attempt to ECC scrub all of cluster memory. The "err_last" value represents the normalized content of the Harrier mem_common->mem_err_status register. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Table Continued 120 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 33, subcode 0x0 (0) SDRAM_I2C_BAD_READ "Memory I2C Bad Read" *** Error: Unable to read from SDRAM at I2C ww.xx.yy.zz This error indicates that an SDRAM DIMM for which information was requested is no longer available. This may be due to an intermittent I2C bus, or a hardware failure. Resolution: A) Cycle power on the node. B) Replace the failing DIMM's pair. C) Replace the node motherboard. Fatal error: Code 34, subcode 0x1 (0xff) PCI_BUS_ERROR "PCI Bus Failure" This error indicates an uncorrectable error occurred on the PCI bus. In the future, the data field may indicate the PCI slot number for the device which failed. In order to determine the cause of this error, it may be useful to review either console messages or the IDE disk log. Typical messages preceding this error are likely difficult to read, but may indicate the exact cause. Example: --- SMI: smm_inb(0x3a) == 0x86 GPE 9 triggered Error in PCI device 02.02.00 (PCI/PCI Bridge #0 (controls slot 1)): PCI status register (0x06) [62b0]: Signaled system error (SERR#), Received master abort Secondary PCI status register (0x1e) [0aa0]: Signaled target abort Bridge P_SERR (0x6a) [80]: Delayed transaction master initiator timeout Error in PCI device 03.01.00 (PCI Slot 1): PCI status register (0x06) [1290]: Received target abort Secondary PCI status register (0x1e) [0a80]: Signaled target abort Error in PCI device 04.06.00 (inside PCI Slot 1): PCI status register (0x06) [1230]: Received target abort Error in PCI device 04.06.01 (inside PCI Slot 1): PCI status register (0x06) [1230]: Received target abort (PCI errors not cleared) Table Continued Error codes—HPE 3PAR OS 3.3.1 121 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 34, subcode 0x1 (ff) Description In the above case, a card in PCI Slot 1 was transferring data up to a device, likely the cluster manager, when it didn't get a response. The bridge above the card received a master abort, which it then relayed to its secondary side as signaled target abort.The bridge on the card in PCI Slot 1 then received the target abort and signaled a target abort on its secondary side. Both PCI devices then indicated they received target aborts. Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Replace the suspected PCI card. D) Remove PCI cards one at a time. E) Replace the node motherboard. Fatal error: Code 35, subcode 0x0 (data) SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable" One or both DIMMs in a DIMM pair has failed. Bits 4-7 of the data value indicate the DIMM pair. If data is 0, then DIMM pair 0 has failed. if data is 10, then DIMM pair 1 has failed. Example: --- SMI: TEMPCAUT (SMALERT): 0x01 (bits reset) Uncorrectable ECC error 0x9279a103 recorded in reg 0x98 Pair1, either DIMM1 or DIMM3 contains the error Error in locations [0x382cd818 .. 0x382cd81f] Uncorrectable ECC error 0x9279a101 recorded in reg 0x94 Syndrome/bit number information might not be accurate, as more than 1 error happened Pair1, either DIMM1 or DIMM3 contains the error Error in locations [0x382cd808 .. 0x382cd80f] (Clearing cache line at 0x382cd800) (Clearing cache line at 0x382cd800) ESR == 0x0003 (expected low bit == 0) Fatal error: Code 35, subcode 0x0 (10) Resolution: A) Cycle power on the node. B) Clear dust and debris from the node. C) Remove and reseat the specified CPU DIMM pair. D) Replace the failed CPU DIMM pair. E) Replace the node motherboard. Diagnostic: A) Verify North Bridge heatsink attachment. B) Check DIMM clock buffers (X6200 on P4-Eagle). C) Check DIMM termination (R5836, etc on P4-Eagle nodes). Table Continued 122 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 35, subcode 0x1 (data) Description SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable" A single DIMM of a DIMM pair has failed. The data value indicates which DIMM. Bits 4-7 of the data value indicate which DIMM pair. Bits 0-3 of the data value indicate which DIMM within that pair. If data is 0, then DIMM 0 of pair 0 has failed. If data is 1, then DIMM 1 of pair 0 has failed. if data is 10, then DIMM 0 of pair 1 has failed. if data is 11, then DIMM 1 of pair 1 has failed. Resolution: A) Cycle power on the node. B) Clear dust and debris from the node. C) Remove and reseat the specified CPU DIMM. D) Replace the failed CPU DIMM. E) Replace the node motherboard. Fatal error: Code 35, subcode 0x2 (data) SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable" This code means an ECC error was detected, but the BIOS did not completely decode the error. See Code 35, sub-code 0x0 for resolution information. Fatal error: Code 36, subcode 0x0 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: SMI: SERR# input went low In the event of a hardware failure, it is normal to trigger a processor System Management Interrupt (SMI).If the SMI gets cleared before the BIOS has a chance to observe it (which should not happen), then this error will result. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Fatal error: Code 36, subcode 0x1 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: SMI: Write made to ACPI PM register In normal operation the operating system should not write to the ACPI PM register. If the BIOS detects a write took place, it will flag this as an error caused by a failing operating system or other node hardware. Resolution: A) Cycle power on the node. B) Reinstall the operating system. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 123 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 36, subcode 0x2 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: SMI not fully handled. The BIOS was not able to determine the actual cause of the triggered SMI. Resolution: A) Cycle power on the node. B) Reinstall the operating system. C) Replace the node motherboard. Fatal error: Code 36, subcode 0x3 (0) FATAL_SMI_ERROR "Fatal SMI Error" --- SMI: No known cause (# 4097) GPE status: 0x400000, GPE input: 0x0xfff7ff *** Error: SMI: No known cause is too frequent This error may result if there is an unknown hardware device triggering SMIs in the system and those SMIs are happening too frequently. Most likely the device continues to trigger an SMI because its problem has not been serviced, and no real work is possible at this point because immediately after returning from the SMI, another is triggered. The BIOS attempts to recognize this condition and stop with a fatal error rather than just continuing to display errors. Resolution: A) Remote reset or cycle power on the node. B) Reinstall the operating system. C) Replace the node motherboard. Table Continued 124 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 36, subcode 0x4 (0) Description FATAL_SMI_ERROR "Fatal SMI Error" *** Warning: SMI cause is too frequent; disabling SMI handling *** Error: SMI cause could not be masked This error may result if a known SMI cause is happening too frequently. In a normally functioning node, SMIs should occur infrequently, as there is a performance impact associated with handling each SMI.The BIOS will first attempt to disable known SMIs in order to mask this problem. If that is insufficient, the BIOS will stop with this fatal error. Resolution: A) Check for CPU memory DIMM correctables in the event log. Replace DIMMs if they are suspect. B) Check for hardware oscillating events in the event log (such as PS status). On some node types, board GPIO changes are reported through SMI. You may need to replace power supplies or another FRU. C) Replace the node motherboard. Diagnostic: A) Set "fatal_no_reboot" at Whack and then enter Whack at the Fatal Error.You should be able to inspect the state of the machine prior to SMI handling to see what status is asserted. Output from the following Whack commands may be helpful: 1) eagle status 2) vsc status 3) pci status 4) mem bridge Fatal error: Code 36, subcode 0x5 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: In SMI on CPU ww [xx], CR2 was 0xyyyy, but got changed to 0xzzzz This error will result if the BIOS inadvertently changes the contents of CR2 while processing a SMI.This should not happen in normal operation, but might happen as the result of a `whack' command. As returning from this SMI could easily cause corruption of the OS or of a user-level program, this fatal error is flagged instead. Resolution: A) Cycle power on the node. Table Continued Error codes—HPE 3PAR OS 3.3.1 125 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode zz (0) Description GEVENT_TRIGGERED"GEvent Triggered" Code 37 sub-codes are a bitmask of error values. This means you may find an error which will simultaneously trigger multiple GEVENTs. This event is probably one of the hardest to interpret as it often will indicate multiple board devices have detected a fatal error condition.In general, it's much more convenient to look up the decoded error in the BIOS output of the idelog rather than manually decoding this event back to indicators. Resolution: Look up each individual documented sub-code below which when OR'd together form the sub-code observed. Fatal error: Code 37, subcode 0x1 (0) GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x01 CMIC_FATAL (GEVENT0) This error indicates the CMIC (North Bridge) had a fatal error. T-Series and F-Series (5000P) nodes: *** Error: GPE[0]: PCI2_PERR_L This error indicates either the PLX #2 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #2 detected a parity error. These components manage PCI slots 0, and 1 on T-Series and Slot 0 on F-Series. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[0]: PEX2_FATAL_ERROR This error indicates that the PLX #2 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 0, 1, and 2; Harrier 1 and 2 LPC0. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove any recently installed PCI cards. D) Remove all PCI cards. E) Replace the node motherboard. Table Continued 126 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x2 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x02 ALERT (GEVENT1) Error in PCI device 00.00.00 (CMIC-LE Memory Controller/ Thin IMB): ESR (0x4c) [0004]: IMBus error (PCI errors not cleared) The output above can be considered "typical" but really may contain any of the possible CMIC (North Bridge) Memory Controller or other PCI bus errors. An IMBus error indicates a communication problem between the North Bridge and one of the South Bridge or CIOBX2. This would likely indicate a node motherboard failure. It has been observed in the field that a flaky or bad PCI socket may also cause this. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove any recently installed PCI cards. D) Remove all PCI cards. E) Replace the node motherboard. T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[1]: MCH Fatal Error This error indicates the MCH (North Bridge) has detected a fatal condition.Most likely there are other error messages present in the idelog to help pinpoint the issue. Since the MCH is the top of the root complex, it's very common to see the MCH indicating Fatal error on nearly all failures. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs if no other error is indicated. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 127 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x4 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: --- SMI: smm_inb(0x39) == 0x04 GPE 2 triggered THERMT_L0_OSB (GEVENT2) This indicates a thermal event triggered a GPIO interrupt. It is a fatal condition on Pentium III nodes, and the node will be immediately taken out of the cluster with this fatal error. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x04 GPE 2 triggered P0_PROC_HOT (GEVENT2) The Pentium 4 CPU supports clock modulation which reduces the core frequency when the core temperature is too high. The BIOS enables this support when starting the OS, so after the node has joined the cluster, the BIOS will asynchronously notify the OS if this event occurs but not take it out of the cluster. At the same time, the Pentium 4 processor will automatically reduce its clock speed so as to generate less heat and not reach a shutdown temperature. This message is therefore not fatal on P4 CPUs. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. T-Series and F-Series (5000P) nodes: *** Error: GPE[2]: PCI0_PERR_L This error indicates either the PLX #0 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #0 detected a parity error. These components manage PCI slots 4, and 5 on T-Series and Slot 2 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[2]: PEX0_FATAL_ERROR Table Continued 128 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description This error indicates that the PLX #0 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 6, 7, and 8; Harriers 1 and 2 LPC2. See Code 37, sub-code 0x1 for resolution information. Chimera nodes: *** Error: GPE[2]: PEX0_FATAL_ERROR This error indicates that the PLX 8796 #0 or #1 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 0, 1, 5 and 6; Harrier 0, LPC0 and LPC2; and Harrier 1, LPC0 and LPC2. See Code 37, sub-code 0x1 for resolution information. Eos, Tornado, and Orion nodes: *** Error: GPE[2]: PEX_FATAL_ERROR This error indicates that the PLX PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 0, 1, and 2; Harrier LPC0 and LPC2. See Code 37, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 129 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x8 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: --- SMI: smm_inb(0x39) == 0x08 GPE 3 triggered THERMT_L1_OSB (GEVENT2) This indicates a thermal event triggered a GPIO interrupt. See Code 37, sub-code 0x2 for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x08 GPE 3 triggered P1_PROC_HOT (GEVENT2) This indicates a thermal event triggered a GPIO interrupt. See Code 37, sub-code 0x2 for resolution information. T-Series and F-Series (5000P) nodes: *** Error: GPE[3]: PCI0_SERR_L This error indicates either the PLX #0 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #0 detected a fatal error (SERR). These components manage PCI slots 4, and 5 on T-Series and Slot 2 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[3]: PEX1_FATAL_ERROR This error indicates that the PLX #1 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 3, 4, and 5; Harrier 1 and 2 LPC1. See Code 37, sub-code 0x1 for resolution information. Chimera nodes: *** Error: GPE[3]: PEX1_FATAL_ERROR This error indicates that the PLX 8750 PCIe-PCIe bridge detected a fatal error. This component manages PCI slots 2, 3, and 4; Harrier 0, LPC1; and Harrier 1 LPC1. See Code 37, sub-code 0x1 for resolution information. Table Continued 130 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x10 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: GPE 4 triggered MIRQ (GEVENT4) This error indicates the memory controller (CNB20HE) triggered an interrupt. The CNB20HE documentation lists possible sources as correctable ECC error on Memory data bus and Processor data bus. See below (P4) for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x10 GPE 4 triggered P0_IERR (GEVENT4) This error indicates that P4 CPU 0 has asserted IERR#, which is used to indicate a processor internal error event occurred. The Intel documentation indicates one cause of this error is a machine check exception when exceptions have not yet been enabled. From our experience in the field, the problem is possibly a CPU or node motherboard failure. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove any recently installed PCI cards. D) Remove all PCI cards. E) Replace the node motherboard. Diagnostic: A) Replace CPUs. B) Replace CPU VRMs. C) Check DIMM termination (R5836 etc on P4-Eagle nodes). T-Series and F-Series (5000P) nodes: *** Error: GPE[4]: PCI1_PERR_L This error indicates either the PLX #1 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #1 detected a parity error. These components manage PCI slots 2, and 3 on T-Series and Slot 1 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[4]: FPGA_LPC_IRQ0_L This error indicates an internal error. This should not occur in a V-Series system. See Code 37, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 131 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x20 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x20 GPE 5 triggered P1_IERR (GEVENT5) This error indicates that P4 CPU 1 has asserted IERR#. See Code 37, sub-code 0x10 (P4) for resolution information. T-Series and F-Series (5000P) nodes: *** Error: GPE[5]: PCI1_SERR_L This error indicates either the PLX #1 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #1 detected a fatal error (SERR). These components manage PCI slots 2, and 3 on T-Series and Slot 1 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P), Eos, Tornado and Chimera nodes: *** Error: GPE[4]: FPGA_LPC_IRQ1_L This error indicates that NEMOE raised the FPGA SMI interrupt and it was not handled properly. See Code 37, sub-code 0x1 for resolution information. Table Continued 132 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x40 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x40 GPE 6 triggered P_SERR (GEVENT6) This error indicates one or more of the system's chipset is asserting P_SERR (primary side system error). Output is usually followed by outstanding PCI errors as indicated by chipset devices. Resolution: A) Identify and replace failing PCI card based on error output. It may be necessary to contact hardware engineering with BIOS output to determine which PCI slot is at fault. B) Remove all PCI cards. C) Replace the node motherboard. T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[6]: MCH Uncorrectable Error This error indicates the MCH (North Bridge) has detected an uncorrectable error. Most likely there are other error messages present in the idelog to help pinpoint the issue. Since the MCH is the top of the root complex, it's very common to see the MCH indicating Uncorrectable error on nearly all failures. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs if no other error is indicated. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 133 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x80 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x80 GPE 7 triggered P_PERR (GEVENT7) This error indicates one or more of the system's chipset is asserting P_PERR (primary side parity error). See Code 37, sub-code 0x40 for resolution information. T-Series and F-Series (5000P) nodes: *** Error: GPE[7]: PCI2_SERR_L This error indicates either the PLX #2 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #2 detected a fatal error (SERR). These components manage PCI slots 0, and 1 on T-Series and Slot 0 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[7]: Not connected This error indicates an internal error. This should not occur in a V-Series system. See Code 37, sub-code 0x1 for resolution information. Eos, Tornado, and Chimera nodes: *** Error: GPE[7]: MCH Fatal Error This error indicates the MCH (North Bridge) has detected a fatal condition.Most likely there are other error messages present in the idelog to help pinpoint the issue. Since the MCH is the top of the root complex, it's very common to see the MCH indicating Fatal error on nearly all failures. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs if no other error is indicated. C) Replace the node motherboard. Table Continued 134 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x100 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: --- SMI: smm_inb(0x3a) == 0x01 GPE 8 triggered CPU_TEMP_INTR (GEVENT8) This indicates a CPU temperature event triggered a GPIO interrupt. See Code 37, sub-code 0x2 for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x3a) == 0x01 GPE 8 triggered S_SERR (GEVENT8) This error indicates one or more of the system's chipset is asserting S_SERR (secondary side system error). See Code 37, sub-code 0x40 for resolution information. T-Series and F-Series (5000P) nodes: --- SMI request via EXT_SMI This error indicates another node in the cluster has forced this node to handle an SMI. Most likely the other node is attempting to force a panic dump because the local node has stopped responding. Resolution: A) Inspect the core dump to determine if the cause was a software or hardware failure. B) Replace the node motherboard if the issue recurs and cannot be identified as a software failure. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[7]: Not connected This error indicates an internal error. This should not occur in a V-Series system. See Code 37, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 135 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x200 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: This error indicates one or more of the system's chipset is asserting SERR (system error).Output is followed by the PCI scan results, which displays outstanding PCI errors of all PCI bus devices. See below (P4) for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x3a) == 0x02 GPE 9 triggered S_PERR (GEVENT8) This error indicates one or more of the system's chipset is asserting S_PERR (secondary side parity error). Resolution: A) Identify and replace failing PCI card based on error output. It may be necessary to contact hardware engineering with BIOS output to determine which PCI slot is at fault. B) Remove all PCI cards. C) Replace the node motherboard. T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[9]: CPU0 IERR_L This error indicates that CPU 0 has asserted IERR#, which is used to indicate a processor internal error event occurred. The Intel documentation indicates one cause of this error is a machine check exception when exceptions have not yet been enabled. From our experience in the field, the problem is possibly a CPU or node motherboard failure. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove all PCI cards. D) Replace the node motherboard. Diagnostic: A) Replace CPUs. B) Replace CPU VRMs. Table Continued 136 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 37, subcode 0x400 (0) Description GEVENT_TRIGGERED"GEvent Triggered" T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[10]: CPU1 IERR_L This error indicates that CPU 1 has asserted IERR#, which is used to indicate a processor internal error event occurred. See Code 37, sub-code 0x200 for resolution information. Chimera nodes: *** Error: GPE[10]: CPU1_THERMTRIP_L This indicates a thermal event on CPU1 triggered a GPIO interrupt. It is a fatal condition and the node will be immediately taken out of the cluster with this fatal error. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. Fatal error: Code 37, subcode 0x800 (0) GEVENT_TRIGGERED"GEvent Triggered" Chimera nodes: *** Error: GPE[11]: CPU0_THERMTRIP_L This indicates a thermal event on CPU0 triggered a GPIO interrupt. It is a fatal condition and the node will be immediately taken out of the cluster with this fatal error. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. Eos and Tornado nodes: *** Error: GPE[11]: THERMTRIP_L This indicates a thermal event on the CPU triggered a GPIO interrupt. See the above information regarding Chimera nodes for resolution. Table Continued Error codes—HPE 3PAR OS 3.3.1 137 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 37, subcode 0x2000 (0) GEVENT_TRIGGERED"GEvent Triggered" Eos, Tornado and Chimera nodes: *** Error: GPE[13]: CAT_ERR_L This error indicates that a CPU has asserted IERR#, which is used to indicate a processor internal error event occurred. The Intel documentation indicates one cause of this error is a machine check exception when exceptions have not yet been enabled. From our experience in the field, the problem is possibly a CPU or node motherboard failure. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove all PCI cards. D) Replace the node motherboard. Diagnostic: A) Replace CPUs. B) Replace CPU VRMs. Non-fatal error: Code 38, sub-code 0x0 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power Supply xx indicates invalid battery configuration: y batteries Verify battery connection and individual battery units. The maximum count of batteries in a string which are supported by software is 3. Any greater number will result in this non-fatal error. The data value may be decoded to determine which power supply and the battery count.The high 8 bits are a bitmask of the power supply. The lower 16 bits are the number of batteries counted. Thus, a data value of 100000c indicates PS1 had a battery count of 12. A data value of 4 indicates PS0 had a battery count of 4. Resolution: A) Verify no more than 3 batteries in a string are connected to any one power supply. B) Cycle power on the node. C) Remove batteries one at a time to determine if there is a faulty connection or battery. Replace the faulty cable or battery. D) Replace the power supply. E) Replace the node motherboard. Table Continued 138 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 38, sub-code 0x1 (0) POWER_SUPPLY_FAILURE"Power Supply Failure" RTC / NVRAM Battery Failure - Replace battery. The RTC / NVRAM battery was found to have a low voltage by the built-in monitoring circuit of the RTC (TOD clock). Resolution: A) Replace the lithium-ion cell battery on the node. B) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x3 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" No batteries present on power supply xx This error indicates no batteries were found on a node power supply. This warning may be enabled by setting "warn_nobat" in NVRAM. The data value may be decoded to determine which power supply triggered this error. The high 8 bits are a bitmask of the power supply. Thus, a data value of 0 indicates PS0 is not present. A data value of 1000000 indicates PS1 is not present. Resolution: A) Verify there is at least one battery connected. B) Cycle power on the node. C) Exchange cables and batteries. D) Replace the power supply. E) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x4 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply missing: node power configuration is not redundant This error indicates one of the two power supplies for a node is not present.This warning may be enabled by setting "warn_ps" in NVRAM. The data value may be decoded to determine which power supply triggered this error. The high 8 bits are a bitmask of the power supply. Thus, a data value of 0 indicates PS0 is not present. A data value of 1000000 indicates PS1 is not present. Resolution: A) Verify both power supplies are present and powered on. B) Power off the missing supply, remove it, and re-insert it in the chassis. C) Replace the power supply. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 139 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 38, sub-code 0x5 (0) POWER_SUPPLY_FAILURE"Power Supply Failure" Battery failure on Power Supply This error indicates that a battery on the power supply has reported a hardware error.The status light on the back of the failed battery will be amber. Resolution: A) Verify both power supplies are present and powered on.Verify batteries are present and powered on. B) Power off the failed battery, remove the cable, and re-insert it in the Power Supply. Turn it back on. If that does not reset the FAILED condition, replace the battery. C) Replace the power supply. D) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x6 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Powering off PSxx because it is on battery power. This will shut down the node until AC is restored. This message indicates that a power supply lost input AC Power and that the BIOS powered down the node to avoid draining the battery. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the DC power status. Bit 0 represents power supply 0 and Bit 1 represents power supply 1. If this bit is 1, then the DC output from the power supply was good when the system shut down. Resolution: A) Apply AC power to the node. B) Replace the power supply. Table Continued 140 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 38, sub-code 0x7 (data) Description POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: Fan Bad or Power supply xx failure: Fan 0 Bad or Power supply xx failure: Fan 1 Bad This error indicates there is a hardware problem in one of the node power supplies. One or more of the fans may have failed. The data value may be decoded to determine which power supply (and fan) triggered this error. The low 2 bits are a bitmask of the fan status for Power Supply 0.The next 2 bits are a bitmask of the fan status for Power Supply 1. Thus: 1: PS0 had a Fan0 failure 2: PS0 had a Fan1 failure 3: PS0 had a double fan failure c: PS1 had a double fan failure 4: PS1 had a Fan0 failure 8: PS1 had a Fan 1 failure Resolution: A) Replace the power supply. B) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x8 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: Charger Overload This error indicates there is a hardware problem in one of the node power supplies, specifically that the charger cannot handle the battery charge current draw. If you need to override this error so the node continues, you can set "ignore_chargefail" in NVRAM. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the charger status for the two power supplies.This a value of 1 indicates PS0 had a charger overload. A value of 2 indicates PS1 had a charger overload. A value of 3 indicates PS0 and PS1 both had a charger overload. Resolution: A) Check battery connection. B) Exchange cables and batteries. C) Replace the power supply. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 141 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 38, subcode 0x9 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Both Power Supplies failed: DC Output Bad This error indicates there is a hardware problem in one of the node power supplies. If this failure is transient, it could also be caused by turning the power supply off and then on or by a quick AC loss followed by AC being restored. If both power supplies fail simultaneously (not likely), this is a fatal error. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the DC Output status for the two power supplies. As a Fatal error, the value will be 3, indicating PS0 and PS1 both had a DC Output Bad. Resolution: A) Ensure a service operation was not taking place at the time, and that AC had not also failed. B) Replace the power supply. C) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x9 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: DC Output Bad This error indicates there is a hardware problem in one of the node power supplies. If this failure is transient, it could also be caused by turning the power supply off and then on or by a quick AC loss followed by AC being restored. If both power supplies fail simultaneously (not likely), this is a fatal error. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the DC Output status for the two power supplies. This a value of 1 indicates PS0 had a DC Output Bad. A value of 2 indicates PS1 had a DC Output Bad. Resolution: A) Ensure a service operation was not taking place at the time, and that AC had not also failed. B) Replace the power supply. C) Replace the node motherboard. Table Continued 142 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 38, sub-code 0xa (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: AC Input Bad This error indicates that AC input power is not being supplied to one or more power supplies. The likely cause is either a real AC Failure or that the power supply has been switched to the off position. In the case of an AC Failure, the power supply will be automatically shut down to preserve batteries (if "ignore_acfail" is set then the power supply will not be shut down). The lower 2 bits of the data value may be decoded to determine which power supply lost AC power. A value of 1 indicates PS0. A value of 2 indicates PS1. A value of 3 indicates both power supplies lost AC power. Resolution: A) Verify AC power is present and the power supply switch is turned on. B) Check the Power Distribution Unit (PDU) breaker. C) Replace the power supply. D) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0xb (0) POWER_SUPPLY_FAILURE"Power Supply Failure" **** Power Supplies mismatch **** Power Supply 0: I2C accessible Power Supply 1: I2C inaccessible This error indicates one of the power supplies is a new style (I2C interface) and the other power supply is not responding using I2C, but has been detected as present. This is not a supported configuration. If you need to override this error, set "ignore_psdiff" in NVRAM. Resolution: A) Pull and re-insert the inaccessible power supply. B) Check the Power Distribution Unit (PDU) breaker for the inaccessible power supply. C) Replace the power supply. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 143 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 38, sub-code 0xc (data) POWER_SUPPLY_FAILURE"Power Supply Failure" This error indicates Power Supply 0 reported a limit was exceeded while performing the power supply status test.Each power supply has integrated monitors for temperature, voltage, and current draw. The BIOS reads these sensors as part of initialization to determine if the power supply is operating within specifications. The data value may be decoded to determine the particular cause of the limit failure. Each bit represents a unique sensor. Data values may be decoded as follows: 00000001 - Temperature 00000004 - 3.3V 00000008 - 3.3V Current 00000010 - 5V 00000020 - 5V Current 00000040 - 12V 00000080 - 12V Current 00000100 - 24V 00000200 - 24V Current 00000400 - 48V 00000800 - 48V Current 00001000 - Bat0 48V 00002000 - Bat1 48V 00004000 - Bat2 48V 00008000 - Bat0 12V 00010000 - Undefined ... to ... 00400000 - Undefined 00800000 - Battery LED is Amber 01000000 - Battery Relay is Off 02000000 - PS LED is Amber 04000000 - Fan Fail 08000000 - DC Fail 10000000 - AC Fail 20000000 - Power Supply is Disabled 40000000 - Power Supply Switch is Off 80000000 - Low Limit exceeded (combined with bits above) Resolution: Contact 3PAR technical support. Non-fatal error: Code 38, sub-code 0xd (data) POWER_SUPPLY_FAILURE"Power Supply Failure" This error indicates Power Supply 1 reported a limit was exceeded while performing the power supply status test. See Code 38, sub-code 0xc for resolution information. Table Continued 144 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 38, sub-code 0xe (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Each newer generation (Magnetek) power supply and battery has an I2C interface which allows the node to acquire power supply internal temperature, voltages, and current loads.The BIOS will verify these readings are within acceptable limits as part of normal initialization. This failure code indicates a limit has been exceeded on a battery attached to a power supply on the node. The data value may be decoded to determine which power supply and battery. The lower 2 bits are a bitmask of the power supply. The upper 16 bits are a bitmask of the failing battery. Thus, a data value of 10002 indicates PS1 Bat0 has exceeded a limit.A data value of 40001 indicates PS0 Bat2 has exceeded a limit. Resolution: A) Check battery expiration date and replace as necessary. B) Power cycle the failing battery. C) Replace battery cable. Diagnostic: A) Use the Whack "bat status" command to display power supply and battery temperatures and voltages to determine the particular failure. Non-fatal error: Code 38, sub-code 0xf (data) POWER_SUPPLY_FAILURE"Power Supply Failure" I2C errors prevented completion of the power test. Each newer generation (Magnetek) power supply and battery has an I2C interface which allows the node to acquire power supply status. This failure codes indicates the BIOS was unable to read one of the Power Supply or battery status registers. The lower 2 bits of the data value may be decoded to determine which power supply failed.A value of 1 indicates PS0. A value of 2 indicates PS1. A value of 3 indicates both power supplies failed. Resolution: A) Power cycle the indicated power supply. B) Replace power supply. C) Replace all attached batteries to the power supply. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 145 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 38, sub-code 0x10 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" PSwwww Batxxxx Switch Off This failure code indicates a battery has its power switch in the off position, and is thus unable to supply back up power to the node in the case of AC Failure. The data value may be decoded to determine which power supply and battery. See Code 38, subcode 0xd for decoding information. Resolution: A) Turn battery on. B) Power cycle the indicated battery. C) Replace battery cable. D) Replace power supply. Fatal error: Code 38, subcode 0x11 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" PS x has down-rev firmware (x) This failure code indicates the power supply firmware revision is not up-to-date and therefore not supported. Resolution: Replace power supply. Fatal error: Code 38, subcode 0x12 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" PS x Battery has down-rev firmware (rev) This failure code indicates the battery attached to the power supply indicated has firmware that is not up-to-date and therefore not supported. Resolution: Replace battery. Table Continued 146 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 39, subcode 0x1 (0) Description OS_STARTUP_FAILURE "OS Startup Error" Maximum count for no successful OS boot (xxxx) exceeded. Type "unset cnt_no_os_boot" to clear this error This error indicates that the BIOS has detected that the node has not successfully booted the OS and will now prohibit boots until operator intervention clears this error. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_no_os_boot" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Fatal error: Code 39, subcode 0x2 (0) OS_STARTUP_FAILURE "OS Startup Error" Maximum count for OS boot with no cluster (xxxx) exceeded. Type "unset cnt_no_cluster" to clear this error This error indicates that the BIOS has detected that the node has booted, but the cluster has not successfully formed several times.The BIOS will prohibit boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_no_cluster" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Table Continued Error codes—HPE 3PAR OS 3.3.1 147 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 39, subcode 0x3 (0) Description OS_STARTUP_FAILURE "OS Startup Error" Maximum count for OS panic (xxxx) exceeded. Type "unset cnt_os_panic" to clear this error This error indicates that the BIOS has detected that the node has booted and then caused a panic several times. When the OS causes a panic, it notifies the BIOS of this event, so the BIOS can track problems.Once a limit is exceeded, the BIOS will prohibit boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_os_panic" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Fatal error: Code 39, subcode 0x4 (0) OS_STARTUP_FAILURE "OS Startup Error" Maximum count for OS cluster without shutdown (xxxx) exceeded. Type "unset cnt_no_shutdown" to clear this error This error indicates that the BIOS has detected that the node has booted, but has not been shut down properly several times.The BIOS will prohibit boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_no_shutdown" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Table Continued 148 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 39, subcode 0x5 (0) Description OS_STARTUP_FAILURE "OS Startup Error" Maximum count for same fatal error (xxxx) exceeded. Type "unset cnt_same_fatal" to clear this error This error indicates that the BIOS has detected that the same fatal or non-fatal error has occurred repeatedly. The BIOS will prohibit boots until operator intervention clears this error.This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Observe other errors present in the PROM log to determine the cause of this error. B) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_same_fatal" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Fatal error: Code 39, subcode 0x6 (0) OS_STARTUP_FAILURE "OS Startup Error" Maximum count for errors logged (xxxx) exceeded. Type "unset cnt_log_error" to clear this error This error indicates that the BIOS has detected that it has recorded too many fatal or non-fatal errors in the board serial PROM and that it should prohibit further boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Observe other errors present in the PROM log to determine the cause of this error. B) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_log_error" at a Whack prompt. C) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. D) Replace the IDE drive. Table Continued Error codes—HPE 3PAR OS 3.3.1 149 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 39, subcode 0x7 (0) OS_STARTUP_FAILURE "OS Startup Error" This node hit the Harrier Mismatch Error, failed ECC This error indicates that the BIOS has detected that the Harrier ASIC's ECC logic has hit an error.It should have triggered an ECC error, but failed to do so. Resolution: Replace the node motherboard. Fatal error: Code 39, subcode 0x10 (0) OS_STARTUP_FAILURE "OS Startup Error" Invalid boot sector. Use "boot net install" to correct this. The IDE disk is used for booting the operating system. This error indicates the boot sector which has been loaded from the disk does not have a valid signature. The most likely cause of this error is that a fresh IDE drive has been installed in the node and it needs to be field net installed. Disk MBR does not have a valid partition table You may also see the above line immediately following the fatal error. This message indicates the partition table in the boot sector (Master Boot Record) was also invalid, and that a "ide log" entry could not be written. Resolution: A) If no hardware has been replaced, first try cycling power on the node. B) Perform a field IDE net install on the drive, or use "boot net install". C) Use the "ide smart status" to acquire the drive SMART status. Replace the IDE drive if a failure is reported. C) Replace the IDE cable. D) Replace the IDE drive. E) Replace the node motherboard. Table Continued 150 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Non-fatal error: Code 40, sub-code 0x1 (0) CBIOS_OS_TIMEOUT"CBIOS to OS timeout" *** Error: CBIOS to OS message communication timeout During CPU SMI initialization, the queue facility to send messages between the BIOS and TPD is tested. If there is a problem triggering an SMI, or some other error which causes message corruption, this error will result.This error is recoverable because the OS can still come up and function at a degraded level even if the communication between the OS and BIOS is not functioning. Resolution: A) View prom log to see if this is repeatable. If not, ignore a single occurrence. B) Cycle power on the node. C) Replace the bootstrap CPU. D) Replace the node motherboard. Fatal error: Code 41, subcode 0x0 (0) CPU_BUS_SPEED_BAD "CPU Bus Speed Bad" *** Error: CPU speed is too slow. The computed CPU speed is lower than the expected minimum supported in a 3PAR node. Most likely this is due to a hardware failure. Since the CPU speed computation depends upon access to the RTC, it is most likely there is a communication problem with the SuperIO containing the RTC. If you need to run with a reduced CPU speed, enter the following command on the node: Whack> set perm cpu_slow_ok See Code 41, sub-code 0x0 for resolution information. Fatal error: Code 41, subcode 0x1 (0) CPU_BUS_SPEED_BAD "CPU Bus Speed Bad" *** Error: Memory speed is too slow. After the CPU speed is computed, the memory bus (FSB) speed is computed.It is computed based on the CPU speed, and bus speed multiplier as reported by the CPU. If you need to run with a reduced Memory bus speed, enter the following command on the node: Whack> set perm mem_slow_ok Resolution: A) Cycle power on the node. B) Replace the bootstrap CPU. C) Replace the node motherboard. Diagnostic: A) Resume past fatal error and look for additional problems such as RTC failure. Table Continued Error codes—HPE 3PAR OS 3.3.1 151 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 42, subcode 0x1 (0) Description CP_I2C_FAILURE "Centerpanel I2C Failure" Failed CP PROM ww.xx.yy.zz read Centerpanel access using Manufacturing PROM: FAILURE The centerpanel is used by the 3PAR cluster for the nodes to communicate. The CM links and backup serial links serve this purpose. There is also a diagnostic I2C bus present in the centerpanel which is used by nodes to diagnose error conditions and reset other nodes in the cluster. As part of the manufacturing process, this bus is tested by accessing the serial PROM which is present on a manufacturing centerpanel. If this test fails, it is likely the node will have a problem accessing the centerpanel I2C bus. Resolution: A) B) Replace the Diagnostic: A) such as the Fatal error: Code 42, subcode 0x2 (0) Cycle power on the node. node motherboard. Use the Whack "i2c" command to access devices board register directly. CP_I2C_FAILURE "Centerpanel I2C Failure" Failed CP PROM ww.xx.yy.zz write Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Fatal error: Code 42, subcode 0x3 (0) CP_I2C_FAILURE "Centerpanel I2C Failure" CP PROM node data does not match what is written: Addr xxxx Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Fatal error: Code 42, subcode 0x4 (0) CP_I2C_FAILURE "Centerpanel I2C Failure" CP PROM pattern data read is incorrect Addr xx Expected yy Read zz ... Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Fatal error: Code 42, subcode 0x5 (0) CP_I2C_FAILURE "Centerpanel I2C Failure" Failed I2C access to board register x.y.z Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Table Continued 152 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 42, subcode 0x6 (0) Description CP_I2C_FAILURE "Centerpanel I2C Failure" Failed I2C access to board register x.y.z Centerpanel access using Manufacturing PROM: FAILURE Titan specific. It does read accessibility check for extra I2C addresses while testing CP PROM 0.a0 and fails with fatal error message if the address is not accessible. Note that if the failure is not related to CP PROM 0.a0, it will not print "CP PROM at 0.a0:" message and only "Failed I2C access to board register x.yy". See Code 42, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 153 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 43, subcode 0x0 (data) Description CPU_PRESENCE_FAILURE"CPU Presence Failure" Voltage ID indicates CPUxx present but TEMP sensor disagrees. This error indicates either a CPU failure or onboard sensors are reading incorrect values for the specified CPU. The VID (voltage ID sense) lines are attached to each physical CPU and used to indicate to the VRMs (voltage regulator modules) the voltage level expected by the CPU.These lines are also connected to the LM87 which use this to determine the correct voltage which should be delivered to the CPU. The TEMP (temperature) sensor is connected to an on-die CPU thermal diode.If its reading is out of acceptable range, the BIOS determines the sensor is not reliably connected to a CPU, or a CPU is not present. Bits 0-1 of data indicate CPU non-presence as determined by the VID sense lines. Bits 8-9 of data indicate CPU nonpresence as determined by connection to the thermal diode. Data Value Failure ------------------------------------------------------------1 CPU0 does not respond to startup 2 CPU1 does not respond to startup 10 CPU0 thermal sensor/voltage ID indicates not present 20 CPU1 thermal sensor/voltage ID indicates not present Resolution: A) Cycle power on the node. B) Remove physical CPU from specific socket and test with no CPU present. B1) If error persists, replace node motherboard. B2) If error clears, replace CPU. C) Replace the node motherboard. Diagnostic: A) Use "i2c env" command to determine whether the temperature or voltage is at fault. B) If CPU temperature shows out of range, and CPU is still functional, suspect thermal diode connection to LM87.Try swapping CPUs to see if problem moves with CPU. C) If CPU voltage shows high or low, but VRM is emitting correct voltage by the voltage sensor, then suspect the VID lines to the LM87. Fatal error: Code 43, subcode 0x1 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Voltage ID indicates CPUxx not present but TEMP sensor disagrees. See Code 43, sub-code 0x0 for resolution information. Table Continued 154 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 43, subcode 0x2 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Physical CPUxx active, but thermal sensor disagrees Bits 0-1 of data indicate CPU non-presence as determined by the running CPU APIC addresses.Bits 8-9 of data indicate CPU non-presence as determined by connection to the thermal diode. See Code 43, sub-code 0x0 for resolution information. Fatal error: Code 43, subcode 0x3 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Physical CPUxx not active, but thermal sensor disagrees Bits 0-1 of data indicate CPU non-presence as determined by the running CPU APIC addresses.Bits 8-9 of data indicate CPU non-presence as determined by connection to the thermal diode. See Code 43, sub-code 0x0 for resolution information. Fatal error: Code 43, subcode 0x4 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Not all hyper-threads started on physical CPUxx Bits 0-1 of data physical CPU0 as addresses. Bits 2-3 of data physical CPU1 as addresses. indicate logical CPU non-presence in determined by the running CPU APIC indicate logical CPU non-presence in determined by the running CPU APIC See Code 43, sub-code 0x0 for resolution information. Fatal error: Code 43, subcode 0x5 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Not all cores started on physical CPUxx Bits 0-3 of data physical CPU0 as addresses. Bits 4-7 of data physical CPU1 as addresses. indicate logical CPU non-presence in determined by the running CPU APIC indicate logical CPU non-presence in determined by the running CPU APIC See Code 43, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 155 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 43, subcode 0x10 (xx) CPU_PRESENCE_FAILURE"CPU Presence Failure" CMIC heatsink disconnected: yy The GPIOs reporting proper connection of the CMIC (North Bridge) heatsink report a loss of connection. This is a board failure which requires a lab technician to reattach the heatsink. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Visually inspect the CMIC heatsink posts to determine if it needs to be reattached. B) Observe the reported xx value to see if it is one or both GPIOs which report the failure. These lines may be traced from VSC055 GPIOs P3.1 (J1300) and P3.2 (J1301). C) The BIOS flag "ignore_hsfail" may be set to override checking the CMIC heatsink. Non-fatal error: Code 44, sub-code 0x00 (xx) NODE_FAN_FAILURE"System Fan Failure" *** Error: One of the node fans is not present, failed, or is unintentionally running at a slower speed than expected. The VSC055 reports tachometer inputs for both node fans, 0 and 1. This is a single node fan failure which requires the fan to be replaced. Resolution: A) Cycle power on the node. B) Replace the node fan. Diagnostic: A) Visually inspect the node fan. B) Observe the fan is present and connected properly. C) If it was misconnected, correct the connection. Otherwise, the fan needs to be replaced. Fatal error: Code 44, subcode 0x01 (xx) NODE_FAN_FAILURE"System Fan Failure" *** Error: Both of the node fans are not present, failed, or are unintentionally running at speeds slower than expected. The VSC055 reports tachometer inputs for both node fans, 0 and 1. This is a dual node fan failure which requires both of the fans to be replaced. The system may overheat. Resolution: A) Cycle power on both nodes. B) Replace both node fans. Diagnostic: A) Visually inspect the node fans. B) Observe the fans are present and connected properly. C) If they are misconnected, correct the connections. Otherwise, the fans need to be replaced. Table Continued 156 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 45, subcode 0x00 (data) QLIS_ISCSI_FAILURE "QLogic iSCSI Failure" *** Error: QLogic iSCSI Failure This error code indicates an error while running the QLogic iSCSI POST. Failed Test (bits 8-15), Slot (bits 4-7) and Port (bits 0-3) are packed into data. Failed Test is one of the following: <QLogic internal card diagnostics> 2 3 4 5 6 7 8 9 240 241 242 243 244 Test Local RAM Size Test Local RAM R/W Test RISC RAM Test NVRAM Test Flash ROM Test Network Internal Loopback Test Network External Loopback Test DMA Transfer (0xf0) Test NOP (0xf1) Test Registers (0xf2) Test DMA Transfer to CPU memory (0xf3) Test DMA Transfer to Cluster memory (0xf4) Card Initialization Resolution: A) Cycle power on failing node. B) Re-seat failing iSCSI card C) Replace failing iSCSI card Fatal error: Code 46, subcode 0x1 (0) BAD_OR_UNKNOWN_CHIPSET "Bad or Unknown Chipset" *** Error: Unrecognized chipset (0xXXXXXXXX). This error code indicates CBIOS does not recognize the chipset installed on the node's motherboard. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Non-fatal error: Code 46, sub-code 0x2 (0) BAD_OR_UNKNOWN_CHIPSET "Bad or Unknown Chipset" *** ME not in operational mode. IPMI data unavailable. This error code indicates that the PCH Management Engine is not in the desired operational mode in PCH chipset. IPMI temperature data is not available in this mode and the systems fans may not run at the proper speed and may not cool the enclosure. Resolution: Contact engineering with data. Table Continued Error codes—HPE 3PAR OS 3.3.1 157 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 47, subcode 0x00 (0) SDRAM_UNAVAILABLE"Control Cache Unavailable" No CPU SDRAM is available. This error indicates that CBIOS has no working CPU memory available for it to continue with POST and ultimately boot the node. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs. C) Replace the node motherboard. Fatal error: Code 48, subcode 0x0 (XXXXXXXX) UNKNOWN_BOARD "Unknown Board" *** Error: Unrecognized board identifier (0xXXXXXXXX). This error code indicates CBIOS does not recognize the board type for the chipset installed on the node's motherboard. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Fatal error: Code 49, subcode 0x1 (data) USB_FAILURE "USB Flash Media Failure" Failed to find USB device handle or Inquiry Request Failed rc = xxxx The USB controller failed to perform a self test. A data value of 0 indicates the BIOS failed to find a USB handle. Resolution: A) If a USB Flash drive is not expected to be present, set the "usb_nodevice_ok" NVRAM variable to override BIOS requiring a USB Flash drive be found. B) Replace the USB Flash drive. C) Replace the node motherboard. Diagnostic: A) Whack "usb test" commands may be used to individually execute USB tests. Fatal error: Code 49, subcode 0x4 (0) USB_FAILURE "USB Flash Media Failure" There was a USB failure in data requested by the operating system bootstrap. It is possible that data on the disk has become corrupt to the point the operating system will not successfully load. Resolution: Reinstall the operating system bootstrap with the "boot net install" command. Table Continued 158 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 49, subcode 0x6 (0) USB_FAILURE "USB Flash Media Failure" USB reported a failure in the read verify command. See Code 49, sub-code 0x1 for resolution information. Fatal error: Code 49, subcode 0x7 (0) USB_FAILURE "USB Flash Media Failure" USB reported a failure in the write verify command. See Code 49, sub-code 0x1 for resolution information. Non-fatal error: Code 49, sub-code 0x17 (0) USB_FAILURE "USB Flash Media Failure" No USB device was found. Resolution: Install or replace the USB Flash drive. Fatal error: Code 50, subcode 0x1 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Invalid control cache setup. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x2 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Incompatible FB-DIMM installed. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x3 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Electrically isolated FB-DIMM. Resolution: A) Replace DIMM. B) Replace node. Fatal error: Code 50, subcode 0x4 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Incompactible module installed. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x5 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Mismatched DIMM pair. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x6 (<DIMM>) SDRAM_INIT_WARNING Odd rank disabled. "Control Cache Init Failure" Resolution: Replace DIMM. Table Continued Error codes—HPE 3PAR OS 3.3.1 159 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 50, subcode 0x7 (0) Description SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM branch failed to train and lockstep mode has been disabled. Resolution: A) Replace all DIMMs. B) Replace node. Fatal error: Code 50, subcode 0x9 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM northbound merge has been disabled. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0xa (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM disabled due to lockstep skew. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0xb (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM rank disabled due to Built-in Self Test failure. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0xe (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Memory interleave range limit invalid. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0xf (0) SDRAM_INIT_WARNING "Control Cache Init Failure" High temp disabled. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x10 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Logical rank with CECC detected. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x12 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Sub-optimal FB-DIMM channel population detected. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x13 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Mismatched AMB pair. Resolution: Replace all DIMMs. Table Continued 160 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 50, subcode 0x14 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM branch disabled. Resolution: A) Replace all DIMMs. B) Replace node. Fatal error: Code 50, subcode 0x15 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM thermal throttling has been disabled. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x16 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Last FB-DIMM AMB has been disabled. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x17 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" The FB-DIMM memory branches do not match in size. Resolution: Contact 3PAR technical support. Fatal error: Code 51, subcode 0x1 (Data) CMA_BIST_FAILURE "CM ASIC Cache BIST Failure" The BIST (Built-in Self Test) in Harrier reported either a BAD value or a different value from what was recorded in the node PROM during MFG board assembly. (Data = Harrier BIST result) Resolution: Replace the node. Note for OPS that Harrier BIST failed, and that the PROM should not be wiped. Non-fatal error: Code 51, sub-code 0x2 (Data) CMA_BIST_FAILURE "CM ASIC BIST Failure" During Harrier initialization, the CMA BIST test failed but due to some other (e.g. I2C I/O error) reason. This error codes indicates that the BIST test itself hasn't failed but there was an error which occurred either during book-keeping (PROM0 read/ write) or the test was not performed at all because it failed to read a Harrier register. (Data = 0x2f) Resolution: Monitor and replace the node if the issue recurs. If the node is replaced, note for OPS that they should verify I2C to the node PROM is functional. Table Continued Error codes—HPE 3PAR OS 3.3.1 161 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 52, sub-code 0x0 Description CPU_PM_FAILURE"CPU Power Management Failure" One or more bits in CPU's 2 General Power Management registers were set due to abnormal power reset. The set bits are printed describing the cause. Resolution: Contact engineering with data. Non-fatal error: Code 52, sub-code 0x1 CPU_PM_FAILURE"CPU TSC is over 48-bits" CPU Timestamp counter is too big after reset. There could be CPU reset issue. Resolution: Contact engineering with data. Fatal error: Code 53, subcode 0x0 (xxxx) FPGA_FAILURE "FPGA Failure" The CPU was unable to communicate with the FPGA. Resolution: Replace node motherboard. Fatal error: Code 53, subcode 0x1 FPGA_FAILURE "FPGA Failure" FPGA revision in EOS node is old. FPGA upgrade is required. Resolution: Upgrade FPGA to the latest revision. Fatal error: Code 53, subcode 0x2 FPGA_FAILURE "FPGA Failure" FPGA revision in Chimera node is not correct. FPGA upgrade is required. Resolution: Upgrade FPGA to the corect revision. Fatal error: Code 54, subcode 0x0 (xxyy) VRM_FAILURE "VRM Failure" A CPU VRM is missing. or A CPU VRM is not providing power. Resolution: A) Replace CPU VRM yy. B) Replace node motherboard. Fatal error: Code 55, subcode 0xzzzzzzzz (yyy) UEFI_PEI_FAILURE "UEFI Failure: PEI" UEFI failed to boot, failed during PEI due to assert. Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3 tree to determine filename of assert. yyy specifies line number (in hex). Resolution: Contact 3PAR technical support. Table Continued 162 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 56, subcode 0xaabb (yyy) Description UEFI_MRC_FAILURE "UEFI Failure: Memory Training" UEFI failed to boot, failed during Intel MRC memory training code. aa specifies the major code, bb specifies the minor code * Code Table * Format: * 0xff Major code * 0xff Minor code 0x30 Correctable error during MRC memory training 0x31 Uncorrectable error during MRC memory training 0x13 WARN_FPT_MINOR_RD_DQ_DQS 0x14 WARN_FPT_MINOR_RD_RCVEN 0x15 WARN_FPT_MINOR_WR_LEVEL 0x00 WARN_FPT_MINOR_WR_FLYBY 0x16 WARN_FPT_MINOR_WR_DQ_DQS 0x1b WARN_FPT_MINOR_DQS_TEST 0x1c WARN_FPT_MINOR_MEM_TEST 0x1d WARN_FPT_MINOR_RCOMP_TIMEOUT 0xE8 ERR_NO_MEMORY 0x01 ERR_NO_MEMORY_MINOR_NO_MEMORY 0x02 ERR_NO_MEMORY_MINOR_ALL_CH_DISABLED 0x03 ERR_NO_MEMORY_MINOR_ALL_CH_DISABLED_MIXED 0xE9 ERR_LT_LOCK 0xEA ERR_DDR_INIT 0x01 ERR_RD_DQ_DQS 0x02 ERR_RC_EN 0x03 ERR_WR_LEVEL 0x04 ERR_WR_DQ_DQS 0xEB ERR_MEM_TEST 0x01 ERR_MEM_TEST_MINOR_SOFTWARE 0x02 ERR_MEM_TEST_MINOR_HARDWARE 0x03 ERR_MEM_TEST_MINOR_LOCKSTEP_MODE 0xEC ERR_VENDOR_SPECIFIC 0xED ERR_DIMM_COMPAT 0x01 ERR_MIXED_MEM_TYPE 0x02 ERR_INVALID_POP 0x03 ERR_INVALID_POP_MINOR_QR_AND_3RD_SLOT 0x04 ERR_INVALID_POP_MINOR_UDIMM_AND_3RD_SLOT 0x05 ERR_INVALID_POP_MINOR_UNSUPPORTED_VOLTAGE 0x06 ERR_MIXED_SPD_TYPE 0x07 ERR_NOT_SUPPORT_EXTENDED_ADDRESS 0XEE ERR_MRC_COMPATIBILITY 0X01 ERR_MRC_DIR_NONECC 0xEF ERR_MRC_STRUCT 0x01 ERR_INVALID_BOOT_MODE 0x02 ERR_INVALID_SUB_BOOT_MODE 0x01 ERR_INVALID_BOOT_MODE 0x02 ERR_INVALID_SUB_BOOT_MODE * Warning Codes 0x01 WARN_RDIMM_ON_UDIMM 0x02 WARN_UDIMM_ON_RDIMM 0x03 WARN_SODIMM_ON_RDIMM 0x04 WARN_4Gb_FUSE 0x05 WARN_8Gb_FUSE Table Continued Error codes—HPE 3PAR OS 3.3.1 163 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description 0x06 0x07 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x09 0x01 0x02 0x03 0x0a 0x01 0x02 0x0b 0x0c 0x01 0x02 0x03 0x0d 0x0e 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x10 0x11 0x12 0x01 0x02 0x13 0x14 0x01 0x15 0x00 0x01 0x02 0x16 0x17 0x01 0x02 0x18 0x01 0x02 0x19 0x1a 0x1b 0x1c WARN_IMC_DISABLED WARN_DIMM_COMPAT WARN_DIMM_COMPAT_MINOR_X16_C0MBO WARN_DIMM_COMPAT_MINOR_MAX_RANKS WARN_DIMM_COMPAT_MINOR_QR WARN_DIMM_COMPAT_MINOR_NOT_SUPPORTED WARN_RANK_NUM WARN_TOO_SLOW WARN_DIMM_COMPAT_MINOR_ROW_ADDR_ORDER WARN_CHANNEL_CONFIG_NOT_SUPPORTED WARN_CHANNEL_MIX_ECC_NONECC WARN_DIMM_COMPAT_TRP_NOT_SUPPORTED WARN_LOCKSTEP_DISABLE WARN_LOCKSTEP_DISABLE_MINOR_RAS_MODE WARN_LOCKSTEP_DISABLE_MINOR_MISMATCHED WARN_LOCKSTEP_DISABLE_MINOR_MEMTEST_FAILED WARN_USER_DIMM_DISABLE WARN_USER_DIMM_DISABLE_QUAD_AND_3DPC WARN_USER_DIMM_DISABLE_MEMTEST WARN_MEMTEST_DIMM_DISABLE WARN_MIRROR_DISABLE WARN_MIRROR_DISABLE_MINOR_RAS_DISABLED WARN_MIRROR_DISABLE_MINOR_MISMATCH WARN_MIRROR_DISABLE_MINOR_MEMTEST WARN_MEM_LIMIT WARN_INTERLEAVE_FAILURE WARN_SAD_RULES_EXCEEDED WARN_TAD_RULES_EXCEEDED WARN_RIR_RULES_EXCEEDED WARN_TAD_OFFSET_NEGATIVE WARN_TAD_LIMIT_ERROR WARN_INTERLEAVE_3WAY WARN_A7_MODE_AND_3WAY_CH_INTRLV WARN_SPARE_DISABLE WARN_PTRLSCRB_DISABLE WARN_UNUSED_MEMORY WARN_UNUSED_MEMORY_MINOR_MIRROR WARN_UNUSED_MEMORY_MINOR_LOCKSTEP WARN_RD_DQ_DQS WARN_RD_RCVEN WARN_ROUNDTRIP_EXCEEDED WARN_WR_LEVEL WARN_WR_FLYBY_CORR WARN_WR_FLYBY_UNCORR WARN_WR_FLYBY_DELAY WARN_WR_DQ_DQS WARN_DIMM_POP_RUL WARN_DIMM_POP_RUL_MINOR_OUT_OF_ORDER WARN_DIMM_POP_RUL_MINOR_INDEPENDENT_MODE WARN_CLTT_DISABLE WARN_CLTT_MINOR_NO_TEMP_SENSOR WARN_CLTT_MINOR_CIRCUIT_TST_FAILED WARN_THROT_INSUFFICIENT WARN_CLTT_DIMM_UNKNOWN WARN_DQS_TEST WARN_MEM_TEST Table Continued 164 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description 0x1d 0x1e 0x20 0x21 0x22 0x23 0x24 0x01 0x25 0x01 WARN_CLOSED_PAGE_OVERRIDE WARN_DIMM_VREF_NOT_PRESENT WARN_LV_STD_DIMM_MIX WARN_LV_2QR_DIMM WARN_LV_3DPC WARN_CMD_ADDR_PARITY_ERR WARN_CMD_MARGINS WARN_NO_EYE_FOUND WARN_SMBUS_FAILURE WARN_SMBUS_RD_FAILURE Resolution: Contact 3PAR technical support. Fatal error: Code 57, subcode 0xzzzzzzzz (yyy) UEFI_DXE_FAILURE "UEFI Failure: DXE" UEFI failed to boot, failed during DXE due to assert. Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3 tree to determine filename of assert. yyy specifies line number (in hex). Resolution: Contact 3PAR technical support. Non-fatal error: Code 58, sub-code 0x0 HECI_FAILURE "HECI Interface Failure" CBIOS failed to obtain the ME firmware flash unlock code through the HECI interface. This could prevent flash commands from functioning. Resolution: Try rebooting the node. Fatal error: Code 59, subcode 0x00 (0) Non-fatal error: Code 59, sub-code 0x01 (bbxxyyzz) FAILSAFE_BIOS_BOOT"Failsafe Boot Halt" The EOS Failsafe BIOS has booted without detecting a CRC error in the Main BIOS indicating a HW initialization failure preventing the node from booting.The Failsafe BIOS has also detected five or more non-CRC failures causing boots to failsafe within the past two hours and has stopped attempting to recover automatically. FAILSAFE_BIOS_BOOT"Failsafe Boot Mode" The EOS Failsafe BIOS has booted. This non-fatal entry is logged to mark the switch over from the Main BIOS to the Failsafe BIOS which may be a different version. The data field contains the build (bb) and version (xx.yy.zz) of the Failsafe BIOS that is booting. Table Continued Error codes—HPE 3PAR OS 3.3.1 165 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 59, sub-code 0x02 (flags) Description FAILSAFE_BIOS_BOOT"Failsafe Boot Mode" The EOS Failsafe BIOS has booted. This non-fatal entry is logged to mark the switch over from the Main BIOS to the Failsafe BIOS which may be a different version. The data field contains FPGA flags at the time of the boot. Bits 0..7- FSBC_STAT register from the FPGA. See FPGA design documentation for details. Bits 8..15- FPGA Revision register. Bits 16..23 - FPGA ID register. (=4 for EOS) Bit 24- Flag indicating state of env var qa_force_bios_to Bit 25- Flag indicating state of env var qa_force_fs_to Flag: 1=var is set, 0=var is not set. Bits 26..31 - Reserved, =0 Non-fatal error: Code 60, sub-code 0x00 (0) Fatal error: Code 60, subcode 0x01 (0) Non-fatal error: Code 61, sub-code 0x00 (data) NEMOE_FAILURE "Nemoe Failure" The OKI Nemoe MCU has failed to boot within the specified timeout and the UEFI BIOS has reset the chip. This non-fatal entry is logged to record the boot failure and attempted restart by BIOS. No recovery action is required for this subcode. NEMOE_FAILURE "Nemoe Failure" The OKI Nemoe MCU has failed to boot within the specified timeout, The BIOS had eset the part, and it still faied to complete its boot initialization before a timeout. The only corrective action is to replace the node. AC_POWER_LOSS "AC Power Loss" Turning off BBU because the node is on battery power. This will shut down the node until AC is restored. This message indicates that all power supplies lost input AC Power and that the BIOS powered down the node to avoid draining the battery. The data value provides a mask of power supplies which have AC good input but failed DC output. Resolution: A) Apply AC power to the node. B) Replace the power supplies. Table Continued 166 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 62, subcode 0x00 (path) Description CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x00 indicates a general timeout or exhaustion of available retries during overall Write/Read/Gate leveling. The "path" value encodes the cma number, channel number, and chip-select map, according to the following bit range mapping: |31 24|2316|15 8|7 0| | cma number | channel number | | chip select map| Resolution: A) Cycle power on the node. B) Reseat CM memory riser card. C) Reseat the failing Cluster memory DIMM. D) Replace the failing Cluster memory DIMM. E) Replace the node motherboard. Fatal error: Code 62, subcode 0x01 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x01 indicates a timeout during write leveling, or an exhaustion of available retries during write leveling. The "path" value encodes the cma number, channel number, and chip-select number, according to the following bit range mapping: |31 24|2316|15 8|7 0| | cma number | channel number | | chip select| See Code 62, sub-code 0x00 for resolution information. Fatal error: Code 62, subcode 0x02 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x02 indicates a timeout during read leveling, or an exhaustion of available retries during read leveling. The "path" value encodes the cma number, channel number, and chip-select number, according to the following bit range mapping: |31 24|2316|15 8|7 0| | cma number | channel number | | chip select| See Code 62, sub-code 0x00 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.3.1 167 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 62, subcode 0x03 (path) Description CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x03 indicates a timeout writing to a Mosys PHY register. The "path" value encodes the channel number and the PHY CSR address, according to the following bit range mapping: |3116|15 0| | channel number | CSR address | See Code 62, sub-code 0x00 for resolution information Fatal error: Code 62, subcode 0x04 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x04 indicates a timeout reading from a Mosys PHY register. The "path" value encodes the channel number and the PHY CSR address, according to the following bit range mapping: |3116|15 0| | channel number | CSR address | See Code 62, sub-code 0x00 for resolution information. Fatal error: Code 63, subcode 0x00 (xxyy) LRDIMM_COMM_FAILURE "LRDIMM Communication failure" xx = SMBUS address yy = SMBUS read failure status This error indicates a failure during a SMBUS read of an LRDIMM iMB register. Resolution: A) Use the Whack command line to re-run the CM initialzation test (cma init). B) Use the Whack command line to Reset node. C) Cycle power on the node. D) Reseat appropriate Cluster Memory DIMM. E) Replace appropriate Cluster Memory DIMM. F) Replace the node motherboard. Fatal error: Code 63, subcode 0x01 (xxyy) LRDIMM_COMM_FAILURE "LRDIMM Communication failure" xx = SMBUS address yy = SMBUS write failure status This error indicates a failure during a SMBUS write to an LRDIMM iMB register. Resolution: See Code 63 sub-code 0x00. Table Continued 168 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 63, subcode 0x02 (xxxxyyzz) Description LRDIMM_COMM_FAILURE "LRDIMM iMB data mis-compare" xxxx = iMB register yy = Expected contents of register zz = Actual contents of register This is a Data MisCompare while verifying LRDIMM iMB register initial values. Resolution: See Code 63 sub-code 0x00. Fatal error: Code 64, subcode 0x00 (FFFF) PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR" Could not find the variable string in the environment variable table. Resolution: A) User needs to fix illegal name in the table or in call to table. Fatal error: Code 64, subcode 0x00 (PortNum) PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR" PortNum specifies the invalid port number. The user entered an incorrect RPC or LPC port number in the "CMA PPHY..." command. Resolution: A) User needs to enter the correct port number for the "CMA PPHY..." command. Non-fatal error: Code 64, sub-code 0x01 (ack) PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_ACK_OUT_TIMEOUT" ack is the current state of the PPHY control register acknowledge bit. If ack is 0, the timeout occurred waiting for the ack bit to assert. If ack is 1, the timeout occurred waiting for the ack bit to deassert. Resolution: A) The "CMA PPHY..." commands are series of commands that allow the user to modify settings on PPHYs or run various tests. Therefore, the user should know what has changed and know whether or not any failures are real or a result of the changes that were made. If you feel the hardware is bad, continue, B) Use the Whack command line to Reset node. C) Cycle power on the node. D) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 169 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 64, sub-code 0x02 (data) Description PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_MISMATCH" data is the actual data value from the PPHY register that did not match the expected value. Resolution: A)See Code 64, sub-code 0x01 resolution. Fatal error: Code 64, subcode 0x03 (PortNum) PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_LBERT_ERROR" PortNum is the RPC port being tested. BERT is used to generate a pattern for the voltage margin test and the test is expected to generate a BERT error. This failure indicates the expected error occurred but did not clear or the expected error did not occur. Non-fatal error: Code 64, sub-code 0x04 (PortNum) PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_PHASE_ERROR Total Eye Margin value is over 1 UI" PortNum is the RPC or LPC port being tested. UI is Unit Interval. Resolution: A) See Code 64, sub-code 0x01 resolution. Non-fatal error: Code 65, sub-code 0x00 (xxyy) BOOT_DISK_WARNING "Boot disk warnings" Booting with the default boot disk (xx) failed and the default boot disk was changed to next available boot disk (yy). Resolution: Check boot disk (xx) for any disk failure. Non-fatal error: Code 65, sub-code 0x01 (xxyy) BOOT_DISK_WARNING "Boot disk warnings" Booting with the default boot disk (xx) failed and next available boot disk (yy) has failed booting. Resolution: Check boot disk (xx) and (yy) for any disk failure Non-fatal error: Code 65, sub-code 0x02 (0) BOOT_DISK_WARNING "Boot disk warnings" Reading boot disk info from PROM failed and dual boot disk configuration was skipped. Resolution: Check PROM for any access failure. Table Continued 170 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 66, sub-code 0x02 (0) Description DIAGNOSTIC_FAILURE "Internal check of cached environment failed" Orion Platform specific. Internal development and test use only. On Orion nodes, the permanent environment is stored in a slow device. To speed up access, the environment is cached when the node boots. When the environment variable 'verify_cache_env' is set, this cached environment is verified against the permanent environment on every variable read. This code is logged when the two are found out of synch. Resolution: A code fix is required to correct this issue. Report the issue to engineering. Until a fix is available, 'set perm no_cache_env' to turn environment caching off and avoid this error. Non-fatal error: Code 67, NVDIMM_ERROR "NVDIMM error and status" sub-code 0x01 (load check) Sub-code 0x01 (NVDIMM_E_LOAD_CHECK) logs the NVDIMM load check result. The "load check" value encodes the result of a battery selftest performed by the NVDIMM hardware after coming out of reset. This value is used to determine when the NVDIMM battery is no longer capable of supporting a backup. This information can be collected for battery characterization Resolution: No action is necessary. Informational only. Non-fatal error: Code 67, sub-code 0x03 (result) NVDIMM_ERROR "NVDIMM error and status" Sub-code 0x03 (NVDIMM_E_SAVE_FAILED) The "result" value encodes the specific backup error reported by the NVDIMM LAST EVENT RESULT register. If "result" is 0x41, then the last save If "result" is 0x42, then the last save progress, but failed to complete, implying NVDIMM If "result" is 0x44, then the last save incomplete. operation timed out. operation was in battery power loss. operation was Resolution: A) Reseat appropriate Cluster Memory NVDIMM. B) Reseat appropriate NVDIMM battery cable. C) Replace appropriate Cluster Memory NVDIMM. D) Replace appropriate NVDIMM battery. E) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.3.1 171 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Non-fatal error: Code 67, sub-code 0x04 (0) Description NVDIMM_ERROR "NVDIMM error and status" Sub-code 0x04 (NVDIMM_E_NO_HPSSSD) The expected NVDIMM (HPSSSD) was not detected when querying the I2C SPD device. Resolution: A) Reseat appropriate Cluster Memory NVDIMM. B) Replace appropriate Cluster Memory NVDIMM. C) Replace the node motherboard. Non-fatal error: Code 67 0x10 through 0xa0 (data) NVDIMM_ERROR "NVDIMM error and status" Sub-codes 0x10 through 0xa0 represent errors detected in specific NVDIMM initialization routines, as follows: 0x0010 // 0x0020 // 0x0030 // 0x0040 // 0x0050 // 0x0070 // 0x0080 // 0x0090 // 0x00a0 // routine. Error Error Error Error Error Error Error Error Error in in in in in in in in in harrier2_init_hcm_restore() routine. nvdimm_init() routine. nvdimm_wait_for_save() routine. nvdimm_wait_for_ready_status() routine. nvdimm_wait_for_main_code() routine. nvdimm_auto_pic_fw_upgrade() routine. nvdimm_flash_program_pic() routine. nvdimm_get_current_version() routine. nvdimm_flash_enter_program_state() The upper 16 bits of the "data" value recorded with this error identify the specific location in the routine souce code where the error was detected. The lower 16 bits of the "data" value may also record function return codes, etc. These error sub-codes are non-fatal, by default, but can be made to produce fatal errors by setting the nvdimm_enable_fatal environment variable. Resolution: A) Reseat appropriate Cluster Memory NVDIMM. B) Reseat appropriate NVDIMM battery cable. C) Replace appropriate Cluster Memory NVDIMM. D) Replace appropriate NVDIMM battery. E) Replace the node motherboard. Table Continued 172 Error codes—HPE 3PAR OS 3.3.1 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Status: Code 127 (STAT_BIOS_DIAG) "BIOS Diag" Status: Code 128 (STAT_BIOS_UPDATE) "BIOS Update" This code is not an error.It is a BIOS diagnostic failure which was forced by the Whack "fatal" command. It is used to test the error logging and reporting mechanisms of the BIOS and TPD software. This code is not an error.It indicates the BIOS determined that it had been updated. During CBIOS initialization, it looks at a value stored in NVRAM to determine if the current version is newer than the version previously booted. If so, the BIOS logs this update.The sub-code is the new BIOS version and the minor code is the old BIOS version. Example: Code 128 (BIOS update) - Subcode 0x10204 (10201) The above indicates CBIOS was updated from version 1.2.1 to 1.2.4. HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Error codes above 255 are in the domain of the OS. HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 173 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Fatal error: Code 257, sub-code yyyyyyyy (xx) Description PROM_EA_MEM_UERR "Uncorrectable Cluster memory" S-Series (PIII and P4) and E-Series (P4) nodes: Log: Uncorrectable Error in Cluster Memory Text: UERR wwwwwwww at cluster DIMM xx, addr yyyyyyyy, syn zzzzzzzz Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: eagle_err_interrupt() of eagleint.c This error indicates the Cluster Manager ASIC (Eagle) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. T-Series, F-Series, and V-Series (5000P) nodes: Log: Uncorrectable Error in Cluster Memory Text: CM UECC Error Status [wwwwwwww]: osp: UECC: address=yy:yyyyyyyy chnl 0xww seg 0xqq synd 0xrr bank=0xss col=0xtttt row=0xuuuu DIMMww.vv Multibit Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: eagle_err_interrupt() of eagleint.c This error indicates the Cluster Manager ASIC (Osprey) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yy:yyyyyyyy). The node is taken out of the cluster in response to this error. If only the xx value is available, the DIMM number may be computed as (xx % 3). (xx / 3). For example: if xx is 2, this would refer to DIMM2.0. Series based on the Harrier ASIC: Log: Uncorrectable Error in Cluster Memory Text: HAR0|1 MemCore0|1 MUERR|UERR IntStatus=wwwwwwww data xxxxxxxx:xxxxxxxx denali channel addr y:yyyyyyyyy syndrome z Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: harrier_err_interrupt() of harrierint.c This error indicates the Cluster Manager ASIC (Harrier) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. DIMM callout (xx) = 0 = DIMM0.0.0 Table Continued 174 Error codes—HPE 3PAR OS 3.3.1 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description 1 2 3 4 5 6 7 = = = = = = = DIMM0.1.0 DIMM0.0.1 DIMM0.1.1 DIMM1.0.0 DIMM1.1.0 DIMM1.0.1 DIMM1.1.1 Series based on the Harrier2 ASIC: Log: Uncorrectable Error in Cluster Memory Text: HAR2 0|1 MemCore0|1 MUERR|UERR IntStatus=wwwwwwww data xxxxxxxx:xxxxxxxx DDR3 addr yy:yyyyyyyy syndrome z Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: harrier2_err_interrupt() of harrier2int.c This error indicates the Cluster Manager ASIC (Harrier2) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. See DIMM callout for Harrier ASIC. This event is usually followed by a core dump on disk. The kernel log text in the core dump usually contains some easy to interpret text which identifies which DIMM has failed. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM(s). D) Replace the failing Cluster Memory DIMM(s). E) Replace the node motherboard. Diagnostic: A) Ensure BIOS tests are enabled using the "table skip none" command at a Whack prompt. B) Use "mem test cm" command to test cluster memory. C) wwwwwwww is the CM Error interrupt status register and the syndrome is zzzzzzzz. These may be decoded using scaffold documentation. Fatal error: Code 258, sub-code xx (yy) PROM_EA_MEM_CERR "Correctable Cluster memory" This error is not currently generated by a node. It is a placeholder should it be necessary to record correctable cluster memory ECC errors in the node PROM. Table Continued Error codes—HPE 3PAR OS 3.3.1 175 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 259, sub-code xx (yy) PROM_EA_XCB_ERR "Error in the XCB engine" This error is not currently generated by a node. a placeholder for CM XCB engine hardware errors. Fatal error: Code 260, sub-code xx (yy) It is PROM_EA_MEM_MUERR "Multiple uncorrectable memory" Log: Multiple Uncorrectable Error in Cluster Memory Text: MUERR wwwwwwww at cluster DIMM xx, addr yyyyyyyy, syn zzzzzzzz Event: Multiple Uncorrectable Memory Error Panic: Panic due to Multiple Uncorrectable Memory Error Where: eagle_err_interrupt() of eagleint.c This error indicates the Cluster Manager ASIC (Eagle or Osprey) has detected multiple uncorrectable memory errors in cluster memory DIMM (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. See Code 257 for error resolution information. Table Continued 176 Error codes—HPE 3PAR OS 3.3.1 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.3.1 Code Description Fatal error: Code 261, sub-code xx (yy) PROM_EA_HW_ERR "Cluster Manager HW error" This error is not currently generated by a node. It is a placeholder for Cluster Manager (Eagle or Osprey) internal hardware errors. Fatal error: Code 262, sub-code xxxxxxxx (yy) PROM_EA_PCI_ERR "Cluster Manager PCI error" Log: Cluster Manager PCI Error Text: ea_pci_err: bus yy, status xxxxxxxx Call CBIOS to analyze error. ... Event: PCI bus yy error xxxxxxxx Panic: Panic due to Eagle PCI error: bus yy, status xxxxxxxx Where: ea_pci_err() of eaint_hdler.c This error indicates the Cluster Manager ASIC (Eagle or Osprey) has detected a PCI bus error while communicating with either a CPU or one of the PCI slot devices. This error is most likely caused by a card which has failed in one of the PCI slots. You may need to observe BIOS output which would be recorded in the crash dump in order to determine the true cause. Resolution: A) Cycle power on the node. B) Read BIOS output to determine if a specific PCI card is implicated by the slot bridges. If so, replace the card. C) Replace the node motherboard. Diagnostic: A) If BIOS messages indicate no other device is at fault, then manual BIOS tests may be performed to determine if the cause is CIOB. You may use the Whack "mem test" command with a CM memory range to generate accesses. Use "eagle status" and "eagle clear" to get and clear errors. B) The "fibre test cluster" command is good to test access from a fibre channel card to the CM. Error codes—HPE 3PAR OS 3.3.1 177 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 This table explains the codes, sub-codes, error code descriptions, and problem resolutions for the CBIOS error codes. When a BIOS initialization or diagnostic test fails such that the node cannot be allowed to continue booting, a fatal error message is often displayed (sometimes with additional information). For each class of error, a major Code is provided. A class-specific sub-code is also provided which gives the specific failure condition. NOTE: A "GEvent" or "GPE" is a "GPIO (General Purpose I/O) event" or "general purpose event". BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 0, subcode 0x0 (0) INITIALIZATION_OK"No Error" This is actually not a node hardware or software initialization or test failure. This code should never occur, and suggests corruption of the PROM log if it is seen. Resolution: Contact 3PAR technical support. Fatal error: Code 1, subcode 0x1 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Unknown CPUID string: `xxxx' Bad or unknown CPU ID (non-Intel).The BIOS is unable to fully identify the processor. This sub-code indicates the CPUID string is not "GenuineIntel". Resolution: A) Replace the processor. B) Try moving the processor to the other CPU socket. It could be a single socket problem. C) Try moving the processor to another system. It could be node hardware or software. D) Replace the node motherboard. Diagnostic: A) Use Whack "cpu id" command. The interesting line will follow a line similar to: Intel Pentium III Processor: or Intel Pentium 4 Xeon Processor: Table Continued 178 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 1, subcode 0x2 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Required features 0x008053fb are missing. Each class of CPU has a list of technology features it supports. If this error occurs, it is because the CPU is either severely downrev, the CPU is bad, or the motherboard is bad. Resolution: A) Replace the processor. B) Try moving the processor to the other CPU socket. It could be a single socket problem. C) Try moving the processor to another system. It could be node hardware or software. Diagnostic: A) Use Whack "cpu id" command. The interesting line will be similar to: Family 6 ... Features 0x0387fbff, Pflags 4 Fatal error: Code 1, subcode 0x3 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: 3PAR has not certified this CPU. Each run of CPU has a major revision and a minor stepping number. If you receive this message, the processor has not yet been verified by 3PAR for reliable operation. If this is a new processor, it may be acceptable to press ^C to resume after this error. If you are testing a new stepping of the processor and need to use it, use the following Whack command to ignore an unknown CPUID: Whack> set perm cpu_unqual_ok Resolution: A) Upgrade to the latest CBIOS to ensure newer certified processors are acceptable. B) Replace the processor with one certified by 3PAR for use with the board. Diagnostic: A) Use Whack "cpu id" command. The interesting line will be similar to: Family 6, Model 8, Stepping 3, Features ... Fatal error: Code 1, subcode 0x4 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: 3PAR has not certified the bootstrap CPU as a dual processor. If more than one processor is installed, both CPUs must be certified to operate in multiprocessor mode. This error indicates that the bootstrap processor was found to not be certified to run in a multiprocessor mode. See Code 1, sub-code 0x3 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 179 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 1, subcode 0x5 (0) Description BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: 3PAR has not certified this CPU as a multiple processor. See Code 1, sub-code 0x3 for resolution information. Fatal error: Code 1, subcode 0x6 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Microcode table size is xxxx, which is not 4 mod 2048. This is an internal CBIOS consistency check error. If you see this error, most likely processor execution out of flash is not stable. The CPU identification is performed after the flash is fully CRC verified, so this error is likely the result of a failing CPU or transient bus operation. Resolution: A) Replace the processor. B) Re-flash the CBIOS (no need to upgrade). Diagnostic: A) Use Arium and scope to watch processor fetches from flash trigger no unusual bus operations. Fatal error: Code 1, subcode 0x7 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Invalid microcode checksum: xxxx This is another internal CBIOS consistency check error. Before each block of update microcode is uploaded to the Pentium, a checksum on it is first verified. If this checksum is not valid, the block will be rejected with this error. See Code 1, sub-code 0x3 for resolution information. Fatal error: Code 1, subcode 0x8 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: Microcode update failed: expected xxxx, got yyyy The processor has rejected the microcode update. This could be any number of things, but is likely due to a failing processor. At this point a strong 64-bit CRC has been run successfully across the BIOS and a checksum for each update line has also passed. See Code 1, sub-code 0x4 for resolution information. Table Continued 180 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 1, subcode 0x9 (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: No microcode update found for this CPU. The BIOS was not able to locate a microcode update for this particular processor, yet it is listed as a CPU which requires a microcode update. This is likely due to use of an unqualified processor. See Code 1, sub-code 0x4 for resolution information. Fatal error: Code 1, subcode 0xa (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: CPU failed BIST (built in self test): xxxxxxxx The processor has failed its own built in self test. This indicates strongly that the processor is at fault. Resolution: A) Replace the processor. B) Replace both processor VRM modules. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 181 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 1, subcode 0xb (0) Description BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" *** Error: First CPU's bus ratios were wwww:xxxx but this CPU's bus ratios are yyyy:zzzz. The two processors in the system board do not have the same bus clock multiplier. The likely cause is that the processors are of different clock speeds (or less likely minor steppings). The "First CPU" as written above is the bootstrap CPU. On a PIII board, the bootstrap CPU (CPU3) is to the right, nearest the PromJet interface. Resolution: A) Remove both heatsinks and verify the processors are rated for the same clock speed and bus multiplier. B) Replace each processor individually. C) Replace the node motherboard. Diagnostic: A) Use Whack "cpu id" command. If you enter Whack before Linux is booted, you will consistently run on the bootstrap CPU. If you enter Whack from Linux (using the whack command), it is a race as to on which CPU you will enter Whack. The SMI output indicates on which CPU whack is running. Using this method, or using the "cpu switch" command, you can "cpu id" all processors in the node. Example: Whack> cpu id Intel(r) Pentium(r) III Processor: Family 6, Model 8, Stepping 3, ... ... CPUID[3] == 0x00000000 0x00000000 0xda28203c ... ... Bus to CPU ratio == 2/13 ... Clock Frequency Ratio == 7 Fatal error: Code 1, subcode 0xc (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" This CPU does not support clock multiplier changes In the supported configuration, the two CPUs present in the node must run at the same clock speed. If the BIOS detects CPUs which have different clock multipliers, it will automatically configure all CPUs to use the highest common clock multiplier. If a CPU's multiplier cannot be changed, then this fatal error will result. See Code 1, sub-code 0x4 for resolution information. Table Continued 182 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 1, subcode 0xd (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" Desired clock multiplier xx is too high for this CPU This error indicates the CPU does not support a clock multiplier the BIOS is attempting to set. See Code 1, sub-code 0xc for resolution information. Fatal error: Code 1, subcode 0xe (0) BAD_OR_UNKNOWN_CPUID "Bad or Unknown CPU ID" Desired clock multiplier xx is illegal for this CPU See Code 1, sub-code 0xd for information on this error. Non-fatal error: Code 2, sub-code 0x1 (0) RTC_FAILURE "RTC Failure" *** Error: Real-Time Clock not initialized. The Real-Time Clock (RTC) is a function of the SuperIO which provides a battery backed system clock and a small quantity of battery backed Non-Volatile RAM for system configuration flags. This error indicates the RTC memory has become corrupt, possibly due to a dead battery or battery removal when no mainline power was available. Resolution: A) Power down, wait 30 seconds, power up. This error should self-correct (likely with a loss of current date/time and other NVRAM contents). Set the date and time using the Whack "rtc date" command. B) Replace the RTC battery, located near the SuperIO ASIC. C) Use the Whack command "rtc date" to set the RTC date and time. D) Replace the node motherboard. Diagnostic: A) Use Whack "time loop" command. The left column is RTC seconds and should increment exactly at second intervals. The right column is a time scaled processor performance counter and should (even in the case of a deviant slow or fast RTC) still increment nearly in lock step with the RTC. B) Verify there is not a dead short across the RTC battery. This will drain the battery and immediately invalidate the Table Continued Error codes—HPE 3PAR OS 3.2.2 183 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 2, sub-code 0x2 (0) Description RTC_FAILURE "RTC Failure" RTC_BATTERY_LOW RTC / NVRAM Battery Failure - Replace battery. The RTC / NVRAM battery was found to have a low voltage by the built-in monitoring circuit of the Real Time Clock (RTC). The RTC battery provides power to the RTC clock function of the SuperIO while the board is not drawing mainline supply power. Over time, this battery's available power will decay (rated for over five years normal operation). Resolution: A) Replace the RTC lithium cell battery on the node motherboard. B) Replace the node motherboard. Diagnostic: A) Verify the lithium cell has a 3V charge. B) Verify there is not a dead short across the RTC battery. This will rapidly drain the battery and immediately invalidate the RTC contents on power down. Non-fatal error: Code 2, sub-code 0x3 (0) RTC_FAILURE "RTC Failure" RTC_INVALID_TIME The current RTC date/time is invalid. Enter the correct date/time or press Tab to acquire it from the network. If the time has not yet been set, or becomes invalid due to loss of battery power, this BIOS will report this error and wait for the user to update the time. Resolution: A) Enter the correct time. B) Press TAB to acquire the time from the network. C) Press ^C to abort prompt and resume boot. Non-fatal error: Code 2, sub-code 0x4 (0) RTC_FAILURE "RTC Failure" RTC_BATTERY_LOW RTC / NVRAM Battery Failure - Replace battery. The RTC / NVRAM battery was found to have a low voltage by the built-in monitoring circuit of the RTC (TOD clock). Resolution: A) Replace the lithium-ion cell battery on the node. B) Replace the node motherboard. Table Continued 184 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 2, sub-code 0x5 (mode) Description RTC_FAILURE "RTC Failure" RTC was found in Binary mode or 12 Hour mode. The RTC has two modes of operation. The BIOS prefers the RTC to be in BCD mode rather than Binary mode. If the RTC is in Binary mode, then this must have been set by the OS. The BIOS will reset the RTC to BCD mode. Also, if the RTC is in 12 Hour mode, then the BIOS will report this and correct the RTC to 24 Hour mode. The mode byte tells us which mode it was in: Bit 1 should be on for 24 Hour mode. Bit 2 should be off for BCD mode. Resolution: A) Informational only. If in Development, then need to alert the development team. Non-fatal error: Code 2, sub-code 0x6 (0) RTC_FAILURE "RTC Failure" RTC update was stopped. Resolution: A) Informational only. then need to alert the development team. Fatal error: Code 3, subcode 0x0 (0) If in Development, SRAM_INIT_FAILURE"CPU SRAM Init Failure" During initialization, memory areas are tested before they are used.SRAM is used by the processor for persistent storage during early initialization and the CPU memory tests. This sub-code indicates that the SRAM walking bits test has failed and that the onboard SRAM may not be reliable. Resolution: A) Power down, wait 30 seconds, power up. This problem is likely not a one time occurrence, so this problem is likely to recur. B) Replace the node motherboard. Diagnostic: A) Use Arium to set and verify SRAM contents. If you notice a pattern, it could be a pulled, stuck, or bridged SRAM line. Fatal error: Code 3, subcode 0x1 (0) SRAM_INIT_FAILURE"CPU SRAM Init Failure" After SRAM contents have been updated with the BIOS static data, a test is performed to ensure the data arrived intact. If it did not, this error is generated. The error could indicate an SRAM failure with the same conditions as above. See Code 3, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 185 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 4, subcode 0x1 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" Pairvvvv DIMMwwww: (Jxxxx) Bad checksum. Got yyyy, SPD said zzzz *** Error: Bad SDRAM configuration. The SDRAM DIMMs located on the motherboard are used for main CPU memory and are critical to the proper operation of a node. Even before the memory is thoroughly tested for proper operation, it must be configured to appear in CPUaddressable space.Each DIMM has a small embedded serial EEPROM which holds DIMM configuration information such as the number of rows, columns, and banks, as well as memory timing. If this serial EEPROM becomes corrupt, data stored in it regarding the DIMM configuration cannot be trusted. So, this EEPROM also contains a checksum which the BIOS verifies is correct before configuring the DIMM. If this checksum does not match the checksum the BIOS computes across the DIMM, this error will result. The minor code reported is the total count of errors for the DIMM. Resolution: A) Replace the defective CPU DIMM with an identical one. B) If an identical one is not available, replace the CPU DIMM pair. See Code 15 for more resolution information. Diagnostic: A) The CPU DIMMs appear on the I2C bus at 3.a0 through 3.a6. Use the Whack "d i2c" command to display the DIMM serial EEPROM contents to determine if there is a pattern. Example (DIMM 2): Whack> d i2c 3.a4.0 See Code 15 for more resolution information. Table Continued 186 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 4, subcode 0x2 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" Pairww DIMMxx (yyyy): 'zzzz' read failed *** Error: Bad SDRAM configuration. Where zzzz is one of: row address, column address, module rows, cas latency3, refresh, banks, cas latency2, cas latency1, ras precharge, act_to_rw, act_to_deact, ras cycle, write_to_deact, density, frequency, or DIMM type. This error indicates that a CPU memory DIMM was detected but that the EEPROM present on the DIMM could not be reliably read. The read operation is done through I2C. See Code 4 above for resolution information. Fatal error: Code 4, subcode 0x4 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: 'ssss' in Pairtt DIMMuu (vvvv): ww != DIMMxx (yyyy): zz *** Error: Bad SDRAM configuration. This error indicates the BIOS detected the CPU SDRAM DIMMs in the bank pair are of a different type. Resolution: A) Ensure both DIMMs in the pair are identical. Note that two DIMMs may have the same capacity but have different number of rows, columns, or banks. The DIMM configuration must exactly match. If the DIMMs have the same manufacturer, markings and capacity, they are probably identical. See Code 15 for more resolution information. Diagnostic: A) The EEPROM SPD information in each pair of DIMMs should be nearly identical. See Code 4 above for more diagnostic information. Fatal error: Code 4, subcode 0x8 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: Pairww DIMMxx (yyyy): bad refresh type zz *** Error: Bad SDRAM configuration. This error indicates the value the DIMM reports for refresh is not valid (greater than the maximum refresh counter). See Code 4 above for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 187 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 4, subcode 0x10 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: DIMM Pair wwww: **** Type not known **** (rows xxxx, cols yyyy, banks zzzz) *** Error: Bad SDRAM configuration. This error indicates the values the DIMM reports for rows, columns, and banks do not correspond to any known configuration for a valid DIMM. It is possible the DIMM EEPROM data has become corrupt or that the DIMM is a higher capacity than what is currently supported. See Code 4 above for resolution information. Fatal error: Code 4, subcode 0x20 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: Unable to configure any DQS lines. OR *** Error: Unable to configure DQS lines for nibble x. *** Error: Bad SDRAM configuration. This is P4 only. This error indicates that BIOS failed to find a set of acceptable DQS values for every or one nibble of the DIMMs. See Code 4 above for resolution information. Fatal error: Code 4, subcode 0x100 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: ACT to DEACT of yy.yy clocks is > 6.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. Resolution: A) Replace CPU DIMMs with 3PAR-certified products. B) Replace the node motherboard. C) If there is no other choice, override this error with a BIOS variable, setting "mem_margin" to the percentage outside margin. Example: *** Error: ACT to RW of 3.06 clocks is > 3.00 (2%) *** Error: Bad SDRAM configuration. Fatal error: Code 4, subcode 0x0 (2) Whack> set perm mem_margin=2 Whack> reboot Table Continued 188 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 4, subcode 0x200 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: Act to RW of y.yy clocks is > 3.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 4, subcode 0x400 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: RAS precharge time of y.yy clocks is > 3.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 4, subcode 0x800 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: RAS cycle time of y.yy clocks is > 9.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 4, subcode 0x1000 (0) SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: RAS to RAS of y.yy clocks is > 2.00 (zz%) *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 189 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 4, subcode 0x2000 (0) Description SDRAM_CONFIG_ERR "Control Cache Config Failure" *** Error: yyyy: Write to deact > 3. We got zzzz *** Error: Bad SDRAM configuration. This error indicates the DIMM pair requires a memory controller setting which is outside tolerance for the chipset's memory controller. This DIMM pair would likely not function correctly if it were allowed to be used. See Code 4, sub-code 0x100 for resolution information. Fatal error: Code 5, subcode 0x1 (0) C_MAIN1_CALL_FAILURE "c_main1 Call Failure" This exception should never happen unless an earlier exception was ignored by pressing ^C. This is because this exception will only occur if the main initialization, diagnostic test, and boot sequence fails to complete a boot and then the user chooses to ignore the error. A further explanation is necessary. There are two halves to system initialization. The first half relies on only SRAM being available and so stack and runtime variables are stored there. Once main CPU memory has been tested, initialization switches to the second half which relies on the tested SDRAM for all data structures. This second half completes initialization and testing of all other node board devices and executes the boot process. For this last step to fail, the IDE disk must either not be present or contains an invalid boot. At that point a fatal error is generated. Do not ignore this condition. It is a final recourse and an abort will reboot or hang the node board. It is safer at this stage to press ^W and enter Whack.From Whack, you can reboot with the "reboot" command. Resolution: A) Check control cache (CPU) DIMMs are installed and pass initialization. B) Verify the node boot drive is present and node software has been installed. C) Replace the node, including CPU DIMMs and boot drive. Table Continued 190 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 6, subcode 0x1 (0) Description SRAM_BAD "CPU SRAM Bad" *** SRAM failure: address xxxxxxxx wrote yy but read zz This failure indicates an early SRAM verification test revealed a problem with the SRAM. This is an unrecoverable error which likely requires hardware diagnostic. This error is displayed by low level init code.It will never be written to the PROM log because hardware which writes to the PROM relies on correctly functioning SRAM. Resolution: A) Cycle power on the node. B) Replace the bootstrap CPU. C) Replace the node motherboard. Diagnostic: A) Use Arium to set and verify SRAM contents. If you notice a pattern, it could be a pulled, stuck, or bridged SRAM line. Fatal error: Code 7, subcode xxxx (yyyy) SDRAM_BUS_FAST "Control Cache Bus Fast" *** Error: Front side bus speed xxxx > expected yyyy This error indicates the BIOS has detected that the front side bus speed exceeds the expected speed (133 MHz on PIII, 533 MHz on P4, 1333 MHz on 5000P).The system may not perform reliably. Resolution: A) Cycle power on the node. B) Replace the bootstrap CPU. C) Replace the node motherboard. Diagnostic: A) Check the oscillator for the front side bus with a frequency counter or an oscilloscope. Fatal error: Code 8, subcode xxxxxxxx (yyyyyyyy) MACHINE_CHECK_FAILURE"Machine Check Failure" Machine check: MCG_STATUS == xxxxxxxx yyyyyyyy During BIOS initialization and testing, the processor must execute instructions. If this error results at any point, it is likely due to failing hardware related to the CPU's instruction execution path. Resolution: A) Cycle power on the node. B) Update the node firmware to the latest version. C) Replace CPU SDRAM in pairs. D) Replace the node motherboard. Diagnostic: A) Replace CPU VRMs. B) Replace CPUs. C) Use Arium and set a breakpoint on a machine check to determine what errant instructions led up to the machine check. D) This problem may also be a BIOS or booter software bug. Observe the values of the error sub-code and data. They make up the 64-bit value of the MCG_STATUS status register. Table Continued Error codes—HPE 3PAR OS 3.2.2 191 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 9, subcode 0x0 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" *** Entering memory segment test: Stack is in xxx *** One of the first memory tests performed in diagnostic mode is a sequential address or random data test. If there is no memory in the system, or the memory DIMMs are mismatched, or there is a memory subsystem problem, this error may result. Resolution: A) Verify memory is installed and in matched pairs (same manufacturer, exact same memory configuration and speed). B) Replace CPU DIMMs with a set of known good ones. C) Replace the node motherboard. Diagnostic: A) Change memory with Whack "c <addr>" command. Examine memory with Whack "d <addr>" command. B) Use Arium to modify and examine memory. Fatal error: Code 9, subcode 0x1 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Insufficient memory: BSS end == xxxx, stack limit == yyyy During the first part of initialization, system stack comes from SRAM.The second part of initialization, system stack comes from CPU memory.If there is insufficient SDRAM (such as no DIMMs installed) this error may result. It is a bad idea to ignore this error with ^C as the system stack will fall past the available memory and probably hang hard the initialization. See Code 9, sub-code 0x0 for resolution information. Table Continued 192 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 9, subcode 0x2 (0) (<DIMM>) "Control Cache Failure" Expected sdram_init_test to be xxxx, but it was yyyy. After SDRAM has been initialized and scrubbed, the BIOS copies runtime variables from Flash to CPU memory. The fact this data is copied to SDRAM is later verified. This fatal error may be caused by either a software error in the BIOS, a hardware error (such as flaky CPU memory), or user intervention such as modifying the memory containing the SDRAM copy of the runtime variables. Resolution: A) Reboot.If the problem is caused by flaky hardware, a prior memory test should catch this condition. B) Upgrade BIOS version. Not a likely solution since this code path is well tested every time the system is booted. C) Replace CPU DIMMs with a set of known good ones. D) Replace the node motherboard. Diagnostic: A) Examine the BIOS memory area using the Whack "d <addr>" memory dump command.SDRAM data appears in CPU memory in the 0x000d0000 region.The key value is 0xdeadbeef. Example: Whack> mem search d0000 10000000 deadbeef Searching 00000000 .. 01000000 for deadbeef [ ] Found at 000d0cb0 If this key cannot be found, something went wrong with the copy or memory has become corrupt. Table Continued Error codes—HPE 3PAR OS 3.2.2 193 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 9, subcode 0x3 (0) Description SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Low 1M test: Test completed: x iterations, y probes, z errors found The low 1 MB of memory is thoroughly tested to ensure reliable operation as this is the memory area that the BIOS and Whack use during further initialization and testing.If this test fails, it should not be ignored with ^C as having reliable system memory is critical to proper operation. Resolution: A) Cycle power on the node. Occasionally, memory will fail during a memory test due to metallic dust. B) Reseat CPU memory DIMMs. C) Pull CPU DIMMs, blow dust from sockets, reseat. D) Replace CPU memory DIMMs in pairs to ensure replacement parts are matched. PIII nodes: Non-paired DIMMs are proximally closest. Paired DIMMs are the leftmostleftmost and rightmost-rightmost of each two which are proximally closest. P4 nodes: Paired DIMMs are proximally closest. DIMM0 and DIMM1 are a pair. DIMM2 and DIMM3 are a pair. E200, Ironman, and Tinman nodes: There is only a single pair of CPU memory DIMMs. E) Replace the node motherboard. Diagnostic: A) Run the memory test manually from Whack. You can use the "mem test range <base> <size>" command to test a range of memory. B) Write to known bad memory with the Whack "c <addr>" command and observe written contents with "d <addr>" Write enough patterns that you might be able to observe a pattern such as stuck or floating bit. Fatal error: Code 9, subcode 0x4 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" High 64K test: Test completed: x iterations, y probes, z errors found In addition to the low 1 MB of memory, older BIOS versions also thoroughly tested the high 64 KB of memory. This is because the operational stack for the CBIOS and Whack used to reside at this address, which made the memory critical for proper initialization and testing.The current BIOS now uses memory below 1 MB for stack space, so this failure code is deprecated. See Code 9, sub-code 0x3 for resolution information. Table Continued 194 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 9, subcode 0x5 (0) Description SDRAM_FAILURE (<DIMM>) "Control Cache Failure" SDRAM walk: Test completed: xx iterations, yy probes, zz errors found During initialization (prior to a thorough test of the low 1 MB of memory), a quick walk through all CPU memory is performed.If an error is found, this fatal error is displayed. See Code 9, sub-code 0x3 for resolution information. Fatal error: Code 9, subcode 0x6 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Full SDRAM test: Test completed: xx iterations, yy probes, zz errors found During later testing, a full SDRAM test is performed which more completely verifies proper memory operation than the cursory SDRAM walk. This test is very similar to the initial thorough 1 MB test done during initialization. See Code 9, sub-code 0x3 for resolution information. Fatal error: Code 9, subcode 0x7 (0) SDRAM_FAILURE (<DIMM>) "Control Cache Failure" Pairwwww DIMMxxxx: Illegal SPD <name of value> <value> This error indicates that a CPU DIMM was detected but that the EEPROM present on the DIMM reported an illegal or unsupported value for our memory controller. Example: Density (SPD byte 31) has more than 1 bit set (ie. 0x30) which indicates a non-standard part. See Code 9, sub-code 0x3 for resolution information. Most likely, the DIMM is not qualified for use in our Node Board. The DIMM number is logged in the Data field of the Fatal Error. Table Continued Error codes—HPE 3PAR OS 3.2.2 195 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 9, subcode 0x10 (0) Description SDRAM_FAILURE"Control Cache Failure" Cannot allocate xx bytes for PCI bus yy scan or Cannot allocate xx bytes for PCI device on bus yy This error indicates there was not enough memory or a memory error occurred while attempting to allocate heap space during the PCI device probe. SDRAM is needed because the BIOS maintains a list of PCI devices present in the system. Resolution: A) Cycle power on the node. B) Remove all PCI cards. C) Replace CPU DIMMs. D) Replace the node motherboard. Diagnostic: A) Set BIOS verbose init flags to get more info during memory init and PCI scan. Whack> set perm mem_verbose Whack> set perm pci_all B) Use the "config->heap" command to show the heap_base, heap_top, and heap_limit values. Fatal error: Code 9, subcode 0x11 (0) SDRAM_FAILURE"Control Cache Failure" Cannot find bus xx in scanned PCI busses During the PCI bus scan, a list of PCI devices present is recorded in SDRAM.For each device present, a block of memory is allocated and initialized. This error indicates that a data value indicating bus number could not be found in the list of devices previously scanned.This is probably due to an SDRAM or CPU failure. Resolution: A) Cycle power on the node. B) Remove all PCI cards. C) Replace CPU DIMMs. D) Replace bootstrap CPU. E) Replace the node motherboard. Fatal error: Code 9, subcode 0x12 (0) SDRAM_FAILURE"Control Cache Failure" No memory installed. This error indicates that the CPU memory scan failed to locate any usable memory for the system. There must be at least one bank of SDRAM configured for the node to operate correctly. Resolution: A) Cycle power on the node. B) Verify CPU DIMM scan output shows DIMMs. C) Replace CPU DIMMs. D) Replace the node motherboard. Table Continued 196 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 9, subcode 0x13 (xxxx) SDRAM_FAILURE "Control Cache Failure" Unknown DDR2 frequency (xxxx) This error indicates that the CPU memory installed is of an unrecognized and thus unsupported memory speed. Supported speeds include 533, 667 and 800 MHz. Resolution: Replace CPU DIMMs with 533, 667 or 800 MHz modules. Fatal error: Code 9, subcode 0x14 (0) SDRAM_FAILURE "Control Cache Failure" FB-DIMM Initialization Failure This error indicates that CBIOS was unable to initialize the CPU memory installed. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs. C) Replace the node motherboard. Fatal error: Code 9, subcode 0x15 (data) SDRAM_FAILURE "Control Cache Failure" This error indicates that an uncorrectable ECC error was detected on a DIMM. The data value is a bitmask that may be decoded to determine which DIMM had the error. A value of 1 indicates DIMM 0, 2 indicates DIMM 1, 4 -> DIMM 2, etc. More than one bit may be set if CBIOS is unable to isolate the error down to a single DIMM. Resolution: A) Cycle power on the node. B) Replace FB-DIMM(s). C) Replace the node motherboard. Fatal error: Code 10, subcode 0x1 (0) PCI_FAILURE "PCI Failure" *** Error: Bus xx cannot be parent of bus yy. *** Error: Failure occurred during PCI device allocation. During the PCI scan, many devices which were programmed by previous PCI scan steps are examined again to verify the programming was successful. This error indicates that a bridge failed to record the PCI bus number of bridges below it. Resolution: A) Cycle power on the node. B) Remove all PCI cards. C) Replace the node motherboard. Diagnostic: A) Use Whack to evaluate offset 0x45 on the failing parent bridge to determine if the value isn't sticking there or there is some problem with the PCI bus below it. Table Continued Error codes—HPE 3PAR OS 3.2.2 197 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x2 (0) Description PCI_FAILURE "PCI Failure" *** Error: Vendor vvvv, Device wwww, for index xxxx: Expected size yyyy, but got zzzz There are on the PCI bus several devices in a node board which are known by the CBIOS to have specific sizes. As a hardware consistency check, the BIOS verifies that these devices are not only present, but also have appropriate memory and I/O space requirements.If any device is found outside of expected requirements, it will cause this error. Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Swap out the PCI card for another qualified card (if it's a card). D) Pull all PCI cards to see if the problem persists. If so, replace any defective cards. E) Replace the node motherboard. Diagnostic: A) Use Whack command "pci probe" using the vendor ID provided in the fatal error to acquire the address information the card provides. If this information does not match the error above, this may be a transient. B) Use the Whack "d pci" command, providing it the "<bus>.<dev>.<func>" of the PCI device. Look for patterns in the data that might indicate a stuck bit. Table Continued 198 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 10, subcode 0x3 (0) PCI_FAILURE "PCI Failure" *** Error: I/O space: address limit xxxx exceeded: yyyy This error indicates that the system has run out of available mapping area while attempting to map this device into the CPU's I/O address range (0x0000 - 0xfe00). The likely cause of this error is that a prior PCI device is consuming too much I/O space.Since most device I/O ranges are extremely small, it is likely a defective PCI card or PCI bus problem which is the cause. Resolution: A) Reseat all PCI cards. B) Swap out individual PCI cards. C) Replace the node motherboard. Diagnostic: A) Use Whack command "pci init" or "pci scan" to re-scan the bus. It may provide the information you need to determine the bad device. B) Review the prior PCI allocations to determine one which is unusually large. You will need to enter diagnostic mode to do this. There are two ways: 1) Press ESC at the initial memory test. Type "go" at the Whack prompt. Answer 'y' to run the PCI initialization. Answer 'a' to print on all phases. 2) Press ^W at the initial memory test.Type "config diag" at the Whack prompt. Answer 'y' to run the PCI initialization. Answer 'a' to print on all phases. Fatal error: Code 10, subcode 0x4 (0) PCI_FAILURE "PCI Failure" *** Error: 32-bit prefetchable memory: address limit xx exceeded: yy Many PCI devices (and software drivers) require DMA addressable memory within the 32 bit address space (less than 4 GB). For this reason, all 32 bit PCI devices are required to be mapped within this space.Currently, all CPU memory is also forced to be mapped within this space, limiting the maximum 32-bit CPU memory to about 3 GB. Resolution: A) Swap out individual PCI cards. B) Replace the node motherboard. See Code 10, sub-code 0x3 for diagnostic information. Table Continued Error codes—HPE 3PAR OS 3.2.2 199 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x5 (0) Description PCI_FAILURE "PCI Failure" *** Error: 32-bit non-prefetchable memory: address limit xx exceeded: yy The non-prefetchable memory has the same 32 bit limitations as prefetchable memory does. See Code 10, sub-code 0x4 for resolution information. Fatal error: Code 10, subcode 0x6 (0) PCI_FAILURE "PCI Failure" *** Error: 64-bit prefetchable memory: address limit xxxx exceeded: yyyy 64 bit PCI devices are not limited to a 32 bit address space. The CPU, however, can only access a 36 bit space (when virtual memory is enabled). Because most drivers need direct access to the memory a device provides on the bus, the device must be addressable by the Pentium and so the maximum 64 bit address allowed is 0xf:ffffffff. This is 64 GB. See Code 10, sub-code 0x4 for resolution information. Table Continued 200 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 10, subcode 0x7 (0) PCI_FAILURE "PCI Failure" Testing CM PCI 64-bit data lines: FAIL The Cluster Manager (Eagle / Osprey) is used to perform a walking bit test on both PCI0 and PCI1 data paths to CPU memory. If a problem is found, with either path, this error will be displayed. The error will be further qualified by one of the following prior lines: PCIxxxx PCIxxxx BitZZ PCIxxxx PCIxxxx BitZZ PCIxxxx BitZZ all data bits stuck high found data bits stuck high: BitWW, BitXX, BitYY, all data bits stuck low found data bits stuck low: BitWW, BitXX, BitYY, data bits possibly floating: BitWW, BitXX, BitYY, Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Pull all PCI cards to see if the problem persists. If so, replace any defective cards. D) Replace the node motherboard. Diagnostic: A) Depending on the specific error above, check for stuck or floating pins on CM's connection to the appropriate PCI bus. B) Depending on the specific error above, check for stuck or floating pins on CIOB's (RCC South Bridge) connection to the appropriate PCI bus. Table Continued Error codes—HPE 3PAR OS 3.2.2 201 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x8 (0) Description PCI_FAILURE "PCI Failure" *** Error: Miscompare CPU Memory to CM Expected (0xAAAAAAAA) Actual (0xBBBBBBBB) Offset (0xCCCCCCCC) *** Error: Miscompare CM to CPU Memory Expected (0xAAAAAAAA) Actual (0xBBBBBBBB) Offset (0xCCCCCCCC) CBIOS runs simple CM PCI Tests as part of POST in both normal operation and manufacturing test. The tests use XCBs to transfer data over both CM PCI interfaces from Cluster Memory to CPU Memory and back. If any test fails due to a data miscompare, the test will generate this fatal error code with sub-code '0x4'. These tests are similar to the Cluster Memory Tests and may fail due to Cluster Memory SDRAM hardware or CPU SDRAM hardware failures. Any test failure will result in a fatal error. Resolution: A) Cycle power on the node. B) Reseat CM memory riser card. C) Reseat the failing Cluster memory DIMM. D) Replace the failing Cluster memory DIMM. E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CMA register set which is mapped into CPU memory for access.Use the Whack "pci probe mem 1590" command to find the Cluster Manager on the PCI bus. The base address in CPU memory for the configuration and status registers (CSRs) is Window 0. Example: Whack> pci probe mem 1590 Win Baseaddr Basesize Identity [0] 00:90200000 00:000004003PAR (ASIC) LPC# [1] 00:20000000 00:20000000 [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x90200000 above). This is the base address of the Cluster Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information on register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Table Continued 202 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 10, subcode 0x9 (0) PCI_FAILURE "PCI Failure" *** Error: PCI bridge has dead clock: xxxxxxxx This error indicates one of the PCI bridges on the board has a bad clock value and is refusing to accept programming of a good clock. Resolution: A) Cycle power on the node. The problem may occur on power cycle (only) with random chance on a bad board. B) Pull all PCI cards which have integrated bridges (QLogic quad port cards are a good example of this). You should power cycle several times to determine it is not an intermittent problem with the motherboard. C) Replace the node motherboard. Diagnostic: A) The PCI output just prior to the fatal error will indicate which of the four bridges has failed.It will be text similar to "Bridge #1 (controls slots 4 & 5)."Refer to rework documentation to correct this problem. Fatal error: Code 10, subcode 0xa (0) PCI_FAILURE "PCI Failure" *** Error: PCI bridge has bad GPIO clock select inputs: x This error indicates one of the PCI bridges on the board has a bad GPIO input which selects bridge clock sources on a power on condition. Resolution: A) Cycle power on the node. The problem may occur on power cycle (only) with random chance on a bad board. B) Replace the node motherboard. Diagnostic: A) The PCI output just prior to the fatal error will indicate which of the four bridges has failed.It will be text similar to "Bridge #1 (controls slots 4 & 5)."Verify that GPIO lines 0-3 are being properly pulled high by comparing against known good board. Table Continued Error codes—HPE 3PAR OS 3.2.2 203 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0xb (0) Description PCI_FAILURE "PCI Failure" Warning: This node has xx PCI cards present, but yy is the required minimum.Please verify your node is properly configured. You may adjust the required minimum with the "set pci_min" command. This error indicates this node has detected less PCI cards than the recommended 3PAR minimum.In a system configuration where there are less than the minimum active PCI cards, inactive load cards should be used to reach the required minimum. Resolution: A) Verify the minimum required number of PCI cards are inserted in the node. Install dummy load cards to reach the required minimum. B) Verify all PCI cards in the system have been identified.Replace any missing card. C) Replace the node motherboard. Diagnostic: A) Isolate the problem to one or more slots by placing load cards in all slots, and then using the "i2c vsc" command to find which slots do not report a load. B) You can use the "i2c vsc" command to verify cards are reporting correct wattages. You can use the "pci probe" command to display all PCI devices, and locate which slot in which they are inserted. Replace any defective card. Fatal error: Code 10, subcode 0xc (0) PCI_FAILURE "PCI Failure" Testing CM PCI 64-bit address lines: FAIL CM XCB TEST miscompare at offset, uuuu Expected (vvvvvvvv) Actual (wwwwwwww) CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz) The Cluster Manager is used to perform a walking bit test on both PCI0 and PCI1 address lines paths from CPU memory into cluster memory. If a problem is found (with either path), this error will be displayed. The particular memory address which caused this error will be indicated. Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Pull all PCI cards to see if the problem persists. If so, replace any defective cards. D) Replace the node motherboard. Diagnostic: A) Depending on the specific error above, check for stuck or floating pins on the Cluster Manager's connection to the appropriate PCI bus. B) Depending on the specific error above, check for stuck or floating pins on CIOB's (RCC South Bridge) connection to the appropriate PCI bus. Table Continued 204 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0xd (zz) Description PCI_FAILURE "PCI Failure" *** Vendor xxxx device yyyy on motherboard not yet qualified. *** Vendor xxxx device yyyy in slot zz not yet qualified. This is an error indicating that the device found is not recognized by the BIOS as a 3PAR-qualified device.This may be because the board is a new generation or that there was a PCI error in communicating with the device. In the former case, it is probably safe to press ^C to ignore this error. In the later case, it is possible that part of the board has become non-functional to where the BIOS may not be able to determine if the rest of the board will continue to function. If you need to override this feature, enter Whack at this point by pressing ^W. Enter the following command: Whack> set perm pci_unqual_ok If the data field is non-zero, it indicates the BIOS discovered the problem is a card in a particular PCI slot. The specific codes are as follows: * 30 is PCI Slot 0 * 31 is PCI Slot 1 * 32 is PCI Slot 2 * 33 is PCI Slot 3 * 34 is PCI Slot 4 * 35 is PCI Slot 5 Resolution: A) Swap out the PCI card for a qualified card. B) Replace the node motherboard. Diagnostic: A) If the card is a QLogic, use the Whack command "pci probe 1077" to find the device and display its device ID. You may need to press ^W first if the BIOS is still at the fatal error. There are several currently qualified PCI cards. Some include the QLogic 2200, 2300, and 2312. More will be qualified in the future. B) The PCI probe should have shown the bus.dev.func specifier you need to display card information directly using Whack. Use the Whack "d pci" command giving it the "<bus.<dev>.<func>" as a parameter. You should see a standard PCI header present. C) Try the same or a different card in a different PCI slot to see if the slot has failed. Table Continued Error codes—HPE 3PAR OS 3.2.2 205 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0xe (0) Description PCI_FAILURE "PCI Failure" PCI bus scan and allocation completed in 21 passes. *** Error: PCI scan required too many passes. Bad PCI interaction. This error indicates the PCI scanning code was unable to lay out a valid PCI address table mapping within 21 passes. The cause of this error is possibly due to either defective hardware or BIOS firmware. Resolution: A) Remove all PCI cards. If error goes away, attempt to find failed card by process of elimination (put back half of the cards and try to boot again). B) Replace the node motherboard. Diagnostic: A) Observe other errors that may happen at the same time as this error. Is there and indication that it is a board ASIC which is failing? In general, some other error should trigger before this one, since device limits are verified. B) Contact BIOS engineer for debug assistance. Fatal error: Code 10, subcode 0x10 (0) PCI_FAILURE "PCI Failure" *** Error: IMB.A isn't turned on This error indicates a possible hardware failure on the board. The bus which connects the CMIC (P4 North Bridge) to CIOB A failed to initialize properly. Resolution: A) Cycle power on the node. The problem may occur with random chance on a bad board. B) Replace the node motherboard. Diagnostic: A) Verify CIOBX2 is receiving a valid clock. B) Look at PCI device 0.0.2.f8 for CIOB A, or PCI device 0.0.1.f8 for CIOB B. The BIOS observes bit 0 of this register to tell if the IMB initialized (0 indicates success). Table Continued 206 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x11 (0) Description PCI_FAILURE "PCI Failure" *** Error: Expected to see device xx.yy.zz `uuuu' but it is not responding: Vendor vvvv, Device wwww. *** Error: Failure occurred during PCI device allocation. The BIOS checks for specific onboard PCI devices (such as bridges) which are known to be on a particular node board. If a device listed in the BIOS table is not found on the board, then this error will result. Resolution: A) Cycle power on the node. B) Remove PCI cards and see if error disappears. C) Replace the node motherboard. Diagnostic: A) The error should indicate for you which device is missing. Observe to see if there is another unknown onboard device which has appeared in its place.This could be the device, masked behind a PCI bus problem. B) Verify the PCI ASIC is functional by checking clocks and PCI data lines to the device. Fatal error: Code 10, subcode 0x12 (0) PCI_FAILURE "PCI Failure" *** Error: The following device is not listed in the hardwired PCI descriptor table: Vendor xxxx, Device yyyy *** Error: Failure occurred during PCI device allocation. Onboard PCI devices (such as bridges) are well known by the BIOS to appear at specific bus addresses. If this device is not known by the BIOS, but it is configured on a bus which is not externally exposed (PCI slot), then you will see this error. Since the node board is a closed solution, this error might occur if an on board device is failing and does not report a correct device vendor/ID, or corrupts the device vendor/ID reported by another device on the bus. See Code 10, sub-code 0x11 for resolution information. Fatal error: Code 10, subcode 0x13 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Was yyyy but is now zzzz *** Error: Failure occurred during PCI device allocation. The PCI header is re-read on multiple passes of the PCI initialization. If a mismatch is found with a previous read of the PCI bus, then this error will result. This is a strong indicator of a flaky device or bus.If the BIOS is in Diagnostic mode (press ESC at the initial memory test), at this point, the following will also be displayed: Starting infinite PCI read loop... In Diagnostic mode, once a failure is detected, this test is then repeated until manual intervention. See Code 10, sub-code 0x3 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 207 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x14 (0) Description PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Invalid 64-bit size: yyyy *** Error: Failure occurred during PCI device allocation. During PCI initialization, a 64 bit window was found on the PCI bus which is outside the 36 bit range imposed by the CPU. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x15 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Allocation size is zero *** Error: Failure occurred during PCI device allocation. During PCI initialization, a window was found on the PCI device with a size of zero. This fatal error may indicate that the BIOS is not able to properly communicate with the PCI device. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x16 (slot) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz *** Error: Failure occurred during PCI device allocation. During PCI initialization, each memory or I/O window present on each device found on the bus is programmed with a CPU memory bus address so that it may be accessed by further BIOS initialization, tests and of course the main operating system. The BIOS verifies the address it programs for each window was correctly programmed (by reading back the value just written). If they do not match, this error is generated. The slot number is an ASCII value represented as Hexadecimal. If the slot value is 0, then the failure occured on a node motherboard device. If PCI Slot 0 was involved, then slot is 30. PCI Slot 1 is 31; PCI Slot 2 is 32; PCI Slot 6 is 36, etc. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x17 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz *** Error: Failure occurred during PCI device allocation. See Code 10, sub-code 0x16 for information on this error. Table Continued 208 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x18 (0) Description PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Wrote yyyy but read zzzz *** Error: Failure occurred during PCI device allocation. See Code 10, sub-code 0x16 for information on this error. Fatal error: Code 10, subcode 0x19 (0) PCI_FAILURE "PCI Failure" *** Error: uu.vv.ww.xx: Invalid allocation size: yyyy (Must be a power of 2) *** Error: Failure occurred during PCI device allocation. During PCI initialization, each memory or I/O window present on each device found on the bus is programmed with a CPU memory bus address. The size of the window require is provided by the specific PCI device. It is required that this window is a power of 2 in size (1 KB, 2 KB, 4 KB, ... 32 MB, 64 MB, etc). This is a consistency check the BIOS performs to ensure it is properly communicating with the PCI device. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x1a (0) PCI_FAILURE "PCI Failure" *** Error: Device does not fit into address space, skipping: attempted addr xxxx, size yyyy *** Error: Failure occurred during PCI device allocation. During PCI initialization, the entire PCI bus is walked as a tree and devices registers are initialized and mapped into processor address space using this tree. The bus structure is then ordered and summarized into a table so that software can later find specific devices for high level initialization. This specific error indicates the PCI scan attempted to map a PCI device into the CPU's 32-bit address space, but failed due to no more available space. Verify that NVRAM flags such as "pci_base" and "mem_max" are not set to unusual values. See Code 10, sub-code 0x3 for resolution information. Fatal error: Code 10, subcode 0x1b (0) PCI_FAILURE "PCI Failure" *** Error: IMB.B isn't turned on This error indicates a possible hardware failure on the board. The bus which connects the CMIC (P4 North Bridge) to CIOB B failed to initialize properly. See Code 10, sub-code 0x10 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 209 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x1c (data) Description PCI_FAILURE "PCI Failure" *** Error: PCI CIOB Primary www MHz (xxx), Secondary yyy MHz (zzz) This error indicates a possible hardware failure on the board. The CIOB (which connects the North Bridge to the I/O system) has an incorrect clock speed. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Eagle nodes should run CIOB at 66 MHz on both the primary and secondary sides. Review fatal error output to determine if the primary side or secondary side is affected. B) Verify clock with scope. Check strapping resisters which select CIOB bus clock speed on reset. Fatal error: Code 10, subcode 0x1d (0) PCI_FAILURE "PCI Failure" *** Error: PCI bridge has bad secondary speed: v.w.x.y = zzzz This error indicates one of the PCI bridges on the board has a bad speed selection set, which could indicate an incorrect type of PCI card has been installed or that bridge mode select strappings are bad. Resolution: A) Pull all PCI cards one at a time to determine failed card. B) Replace the node motherboard. Diagnostic: A) Check Intel 31154 mode select strapping resistors to ensure PCI-X mode is selected. Refer to Ironman rework instructions to correct this. B) PCI offset 0xf2 in the 31154 indicates, among other things, the mode selected. Bits 6-8 should have the value 010 for proper operation (100 MHz secondary PCI bus speed). C) Some pre-production Ironman nodes have not been reworked to correct this defect. To ignore this error, set the "pci_speed_any" NVRAM flag by pressing ^W to enter Whack and entering: Whack> set perm pci_speed_any Table Continued 210 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 10, sub-code 0x1e (0) Description PCI_FAILURE "PCI Failure" *** Error: PCIe x.y.z: Invalid port configuration strappings (xxx). The indicated PLX switch chip has incorrect hardware configuration strappings. Resolution: Replace the node motherboard. Non-fatal error: Code 10, PCI_FAILURE "PCI Failure" sub-code 0x1f (YYYYYYxx) *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected link width detected (xx). This error indicates that the device found is not running at the correct PCIe link width. If the "xx" portion of the data field is non-zero, it indicates a problem with a particular PCI slot. The specific codes for "xx" are as follows: 30 is PCI Slot 0 31 is PCI Slot 1 32 is PCI Slot 2 33 is PCI Slot 3 34 is PCI Slot 4 35 is PCI Slot 5 36 is PCI Slot 6 37 is PCI Slot 7 38 is PCI Slot 8 To ignore this error, enter Whack by pressing ^W and entering: Whack> set perm pci_speed_any Resolution: A) Replace indicated card (if "xx" is non-zero). B) Replace node motherboard. Non-fatal error: Code 10, sub-code 0x20 (YYYYYYxx) PCI_FAILURE "PCI Failure" *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected link speed detected (xxx). This error indicates that the device found is not running at the correct PCIe link speed. See Code 10, sub-code 0x1f for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 211 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 10, sub-code 0x21 (xxx) Description PCI_FAILURE "PCI Failure" *** Error: Slot xxx indicates no HBA present, but PCI device found This error indicates that a PCI device was found in a slot which was expected to be empty. The likely cause of this failure is an HBA which is not fully seated. If this is an expected failure, you can set "pci_missing_ok" to override this check. Resolution: A) Reseat or replace the indicated HBA. B) Replace node motherboard. Non-fatal error: Code 10, sub-code 0x22 (xxx) PCI_FAILURE "PCI Failure" *** Error: Slot xxx indicates HBA present, but no PCI device found This error indicates that no PCI device was found in a slot which was expected to be populated (HBA present). The likely cause of this failure is an HBA which has failed. If this is an expected failure, you can set "pci_missing_ok" to override this check. Resolution: A) Reseat or replace the indicated HBA. B) Replace node motherboard. Table Continued 212 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 10, sub-code 0x23 (data) Description PCI_FAILURE "PCI Failure" *** Error: PCI device bb.dd.ff (slot ss) hung during previous error scan This error indicates that during a previous PCI scan, the CPU hung. The most probable cause of this error is a defective HBA. The data field provides several details about the suspect device. The low byte indicates which PCI slot, if known. Value 0x30 corresponds to PCI Slot 0, 0x31 is PCI Slot 1, ..., 0x38 is PCI Slot 8. Byte 2 and byte 1 correspond to the PCI bus.dev.func. Byte 3 indicates whether the failure occurred during a PCI error scan, and whether this is a repeat failure. Decode table for data: bits 0..7 PCI Slot (0x00=MB, 0x30..0x38=PCI Slot 0..8) bits 8..10 PCI func bits 12..15 PCI dev bits 16..23 PCI bus bits 24..28 Reserved (0) bit 29 Repeat flag (1=repeat -- fatal error) bit 30 Hang during (0=PCI scan, 1=PCI error scan) bit 31 Reserved (1) Example (data=c00a0a35): The 0x35 value implicates PCI Slot 5. The 0a0a value is bus.dev.func 0a.01.02. The c0 value tells the hang occurred during a PCI error scan. Example (data=a0090831): The 0x31 value implicates PCI Slot 1. The 0908 value is bus.dev.func 09.01.00. The a0 value indicates a repeated hang during the PCI scan. Resolution: A) Replace HBA if PCI Slot is indicated. B) Convert to PCI bus.dev.func and match with the suspect PCI device from previous BIOS messages. If this is an onboard device, replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 213 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x24 (data) Description PCI_FAILURE "PCI Failure" *** Error: PCI device bb.dd.ff (slot ss) hung during previous scan Hang occurred multiple times. This error indicates that during a previous PCI scan, the CPU hung repeatedly. Other than this being a fatal error, this code is identical to that of sub-code 0x23. Note that if this fatal error is seen without a preceeding non-fatal subcode 0x23, then the failure is likely to be the node motherboard. If the non-fatal is not logged, then a PCI scan hung earlier in the PCI tree than a previous hang.Unless both hangs happened on the same HBA, the cause is likely a shared device on the node motherboard. See Code 10, sub-code 0x23 for resolution information. Fatal error: Code 10, subcode 0x25 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe bb.dd.ff: Serial EEPROM is not present. This error indicates that the PCI device does not have an EEPROM attached. Resolution: Replace node motherboard. Fatal error: Code 10, subcode 0x26 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe bb.dd.ff: Unable to write Serial EEPROM. This error indicates that the EEPROM failed to be programmed. Resolution: Replace node motherboard. Fatal error: Code 10, subcode 0x27 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe bb.dd.ff: Unable to read Serial EEPROM. *** Error: PCIe bb.dd.ff: Serial EEPROM index XX value 0xXXXXXXXX != expected 0xXXXXXXXX. This error indicates that BIOS was unable to verify the EEPROM contents after programming or that the data was successfully written but did not persist. Resolution: Replace node motherboard. Table Continued 214 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 10, subcode 0x30 (0) Description PCI_FAILURE "PCI Failure" *** Error: PCIe b.d.f Link Width incorrect size. Found xx, s/b yy This error indicates that the device found is not running at the correct PCIe link width. xx is actual PCIe link width and yy is the expected PCIe link width. This error may be logged with some HBA cards with x4 PCIe lanes. To ignore this error, enter Whack by pressing ^W and entering: Whack> set perm pci_speed_any Resolution: A) Ok to ignore if this is related to HBA card with x4 PCIe lanes B) Replace indicated card. Fatal error: Code 10, subcode 0x31 (0) PCI_FAILURE "PCI Failure" *** Error: PCI b.d.f (vvvv.dddd) in slot ss: Unexpected link width detected (xx). This error indicates that Harrier2 ASIC device found is not running at the correct PCIe link width. Resolution: A) Power cycle the node B) Replace node motherboard. Fatal error: Code 10, subcode 0x32 (0) PCI_FAILURE "PCI Failure" *** Error: PCIe b.d.f indicates HBA present, but no PCI device found This error indicates that PCI device not found. Resolution: A) Reseat card B) Replace indicated card Table Continued Error codes—HPE 3PAR OS 3.2.2 215 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 11, subcode yyyy (0) UNRECOVERABLE_TRAP "Unrecoverable Trap" *** Error: CPU exception detected: Stopping execution. The BIOS installs an interrupt handler to catch spurious (unexpected) interrupts and exceptions during initialization and testing of the node hardware.During initialization, the BIOS even tests to verify a generated interrupt is delivered correctly. This is a serious condition and should not be ignored by pressing ^C. The specific interrupt received is the sub-code displayed. The interrupt number will be less than 0x20. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Review previous output lines to determine whether interrupts were just enabled (it follows the CPU identification). You should see a message: --- This interrupt was expected If this is not present, then most likely the interrupt or exception occurred immediately after being enabled. B) Using Whack, you can manually enable and disable interrupts with the "cpu interrupt enable" and "cpu interrupt disable" commands. You can also use the "cpu interrupt <num>" command to generate an interrupt. If interrupts are enabled, you should see a message upon generating an interrupt. One of: --- This interrupt was expected or *** Error: Expected interrupt xxxx but got yyyy or *** Error: CPU exception detected: Stopping execution. The two former messages will only occur if the BIOS is still expecting an interrupt to be delivered. The later message will only be displayed if the interrupt is numbered 0x20 or higher. Table Continued 216 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 12, subcode 0x0 (0) Description UNEXPECTED_INTERRUPT"Unexpected Interrupt" PIII or P4 node: --- SMI: No known cause (# zz) GPE status: yyyyyy, GPE input: zzzzzz An SMI is a System Management Interrupt, and interrupt generated by the node hardware for the BIOS to service a particular failure. This error indicates the BIOS was unable to determine the cause of the SMI delivered by hardware. See Code 11 for resolution information. Fatal error: Code 12, subcode 0x0 (0) UNEXPECTED_INTERRUPT"Unexpected Interrupt" Ironman, Tinman, Titan, or Atlas nodes: CPU0 SMI: Bootstrap CPU0 SMI: Updating CPU0 SMI: Updated --- SMI: No known cause (# 1) on CPU6 SMSCS[0] = 0x00000000 ... ALT_GP_SMI_EN = 0xbfbf ALT_GP_SMI_STS = 0x0000 TMP_STS= 0x00000000:88380000 TMP_INT= 0x00000000:00000001 This fatal error indicates the BIOS received an SMI, but wasn't able to determine which device caused the interrupt. In this example, the "Bootstrap," "Updating," and "Updated" messages suggest the BIOS firmware was updated. Resolution: A) Reboot the node. B) Replace the node motherboard. Fatal error: Code 12, subcode 0x1 (yyyy) UNEXPECTED_INTERRUPT"Unexpected Interrupt" *** Error: Expected interrupt xxxx but got yyyy During initialization, the BIOS installs an interrupt handler to verify interrupts are delivered reliably. It then generates an expected interrupt.If an interrupt is delivered which is not the same as the one expected, this error is displayed. The interrupt number, yyyy, represents which interrupt occurred. See Code 11 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 217 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 13, subcode 0x0 (yyyy) INTERRUPT_FAILURE "Interrupt Failure" *** Error: Interrupt 0x20 could not be generated. or *** Error: Interrupt 0xff could not be generated. During initialization, the BIOS installs an interrupt handler to verify interrupts are delivered reliably. It then generates a few expected interrupts.If the specific interrupt is not delivered, this error is displayed. The interrupt number, yyyy, represents which interrupt should have been generated. See Code 11 for resolution information. Fatal error: Code 14, subcode 0x0 (0) ECC_FAILURE "Control Cache ECC The Whack "mem test ecc" command the main memory to ensure ECC memory functioning. If this test fails, this message other messages giving details. Failure" performs an ECC test over error correction is is displayed, together with Note: Running the "mem test ecc" command destroys some memory locations in the range of [0 .. 512 KB] and [1 MB .. just below the top of SDRAM].Hence, executing this once Linux has booted will cause it to fail if it is reentered. If you see this failure often during BIOS initialization, then the cause is likely a hardware problem. Specifically, the error tells you that the hardware ECC error mechanism is not working correctly. Changing CPU memory DIMMs may solve the problem, but it's more likely a board failure. Resolution: A) Ensure the North Bridge heatsink is firmly attached. B) Replace CPU DIMMs. C) Replace bootstrap CPU. D) Replace the node motherboard. Table Continued 218 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 14, subcode 0x1 (1) Description ECC_FAILURE "Control Cache ECC Failure" *** Error: Missing ECC SMI [80] <= 1, data 0 0. Copy 0 Now 0 mode 0 00 10 20 30 - 0f: 1f: 2f: 3f: 00 01 04 aa 00 ff 09 aa 00 00 08 0a 00 00 09 02 00 00 20 a8 00 00 09 00 00 00 10 00 00 ff 09 00 | | | | 00 ff 18 00 00 ff 09 00 00 ff 00 00 00 ff 09 c0 00 ff 00 7b 40 ff 00 df 0c ff 59 ff 00 ff 8e ff This error indicates the BIOS ECC hardware test could not get the hardware to generate an ECC SMI in response to a corrupted memory address. It possibly indicates a failing DIMM or memory controller, or that memory timings are too fast for the DIMMs present in the node. See Code 14, sub-code 0x0 for resolution information. Fatal error: Code 15, subcode 0x0 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" mailbox register xxxx changed inappropriately (yyyy) != expected (zzzz) register test: FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. During POST, all present FCAL adapters are tested for functionality.The HBA cards sometimes require a firmware download for full capability. POST does not have access to this firmware and will only test basic register access and functionality.If the Register Test fails, POST will indicate this error. If the user continues past this error (^C), software will log the error and continue testing the other PCI cards (if present). Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. B) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued Error codes—HPE 3PAR OS 3.2.2 219 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 15, subcode 0x1 (slot) Description PCI_FIBRE_FAILURE (<slot>)"PCI Fibre Failure" controller memory xxxx value (yyyy) != expected (zzzz) memory test:FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. During POST, all present FCAL adapters are tested for functionality.The HBA cards sometimes require a firmware download for full capability. POST does not have access to this firmware and will only test basic functionality. If the Onboard Memory Test fails, POST will indicate this error. If the user continues past this error (^C), software will log the error and continue testing the other PCI cards (if present). Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. B) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued 220 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 15, subcode 0x2 (slot) Description PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" data bits possibly float: Bitxxxx-Bityyyy. PCI walking bits:FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. During POST, all present FCAL adapters are tested for functionality.The HBA cards sometimes require a firmware download for full capability. POST does not have access to this firmware and will only test basic functionality. If the PCI Fibre Card Bus Test fails, POST will indicate this error. If the user continues past this error (^C), software will log the error and continue testing the other PCI cards (if present). Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. C) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued Error codes—HPE 3PAR OS 3.2.2 221 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 15, subcode 0x3 (slot) Description PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" data bits possibly float: Bitxxxx-Bityyyy. CM0 walking bits: FAIL (slot) = PCI slot number There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. This test indicates a problem was observed with the fibre channel card talking with the Cluster Manager. If the "fibre test pci" test passed, then this problem is likely in the interface to the CM or CM memory. Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. C) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Table Continued 222 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 15, subcode 0x4 (slot) Description PCI_FIBRE_FAILURE (<slot>) PCIe EYE test: FAIL (slot) = PCI slot number "PCI Fibre Failure" There are 6 or 9 PCI slots available to insert PCI adapter cards on the Node Board.The slots are numbered 0-6 from left to right when looking at the front of the P4 Ealge and Ironman Nodes. The slot are numbered 0-2, 3-5, 6-8 on Titan and Atlas and the top three will depend on which slot the node is in. If the "fibre test cm" test passed, then this problem is likely in the PCIe to PCIE link between teh card and the switch. Resolution: A) Reseat the failing PCI Fibre Adapter. B) Analyze other failures in the system. If the CM PCI XCB test passed, replace the PCI Fibre Adapter. C) Replace the node motherboard. Diagnostic: A) Whack "fibre" and "pci" commands communicate with each PCI Fibre Card. Refer to the slot that produced the error for further diagnostic information and procedure. Fatal error: Code 15, subcode 0x10 (slot) Fatal error: Code 15, subcode 0x11 (slot) Fatal error: Code 15, subcode 0x13 (slot) Fatal error: Code 15, subcode 0x14 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" BIOS can not make LSI card go into Operational state. Resolution: A) Replace card. Send failed card back for FA. PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" HBA card register test failure Resolution: A) Replace card. Send failed card back for FA. PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" LSI card register memory copy test failure. Resolution: A) Replace card. Send failed card back for FA. PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" LSI card register memory copy test failure. Resolution: A) Replace card. Send failed card back for FA. Table Continued Error codes—HPE 3PAR OS 3.2.2 223 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 15, subcode 0x15 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" Firmware rev xxxx not supported. Upgrade to yyyy LSI card does not contain 3PAR-approved firmware. If you need to run with an LSI card which has an older firmware (engineering only), you can set the "lsi_downrev" flag in the BIOS.Example: Whack> set perm lsi_downrev Resolution: A) Replace card. upgrade. Fatal error: Code 15, subcode 0x16 (slot) PCI_FIBRE_FAILURE (<slot>) Unable to get firmware rev Send failed card back for "PCI Fibre Failure" Attempting to get the firmware version from the LSI card failed. Resolution: A) Cycle power on the node. B) Replace card. Send failed card back for FA. Fatal error: Code 15, subcode 0x17 (slot) PCI_FIBRE_FAILURE (<slot>) "PCI Fibre Failure" Manufacturing test for E200 node Only. This error occurs when the onboard LSI chips are not found. They are expected to be in slot 0 and 3, with two devices on each slot. Resolution: A) Cycle power on the node. B) Replace motherboard. Fatal error: Code 17, subcode 0x0 (0) IDE_FAILURE "Internal Drive Failure" The IDE controller failed its internal self test. Resolution: A) Replace the IDE or SATA boot drive. B) Replace the IDE or SATA cable. C) Replace the node motherboard. Diagnostic: A) Whack "ide test" commands may be used to individually execute IDE tests. Fatal error: Code 17, subcode 0x1 (0) Fatal error: Code 17, subcode 0x2 (0) IDE_FAILURE "Internal Drive Failure" The IDE controller failed to perform a self test. See Code 17, sub-code 0x0 for resolution information. IDE_FAILURE "Internal Drive Failure" IDE register xx value (yyyy) != expected (zzzz) The IDE register test failed during a pattern test. See Code 17, sub-code 0x0 for resolution information. Table Continued 224 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 17, subcode 0x3 (0) IDE_FAILURE "Internal Drive Failure" IDE register xx value (yyyy) != expected (zzzz) The IDE register test failed during a walking bit test. See Code 17, sub-code 0x0 for resolution information. Fatal error: Code 17, subcode 0x4 (0) IDE_FAILURE "Internal Drive Failure" There was an IDE failure in data requested by the operating system bootstrap. It is possible that data on the disk has become corrupt to the point the operating system will not successfully load. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x5 (0) IDE_FAILURE "Internal Drive Failure" Communication with the IDE interface timed out. This error indicates the drive is not responding to commands within an acceptable amount of time. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x6 (0) IDE_FAILURE "Internal Drive Failure" IDE reported a failure in read verify command. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x7 (0) IDE_FAILURE "Internal Drive Failure" A timeout (10 seconds) was detected while performing DMA operation. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x8 (0) IDE_FAILURE "Internal Drive Failure" An error condition was detected while performing DMA operation. Resolution: Replace the IDE or SATA boot drive. Table Continued Error codes—HPE 3PAR OS 3.2.2 225 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 17, subcode 0x9 (xx) Description IDE_FAILURE "Internal Drive Failure" IDE power up: Unknown error ERROR : 80 SECCNT: 80 SECNUM: 80 CYLLOW: 80 CYHIGH: 80 DEVSEL: 80 ALT_STATUS: 80 Drive: BUSY The IDE drive had a failure at poweron reset which prevents it from communicating with the chipset IDE controller. Resolution: A) Cycle power on the node. B) Reseat drive cable on both node and drive. C) Replace the IDE or SATA boot drive. D) Replace the node motherboard. Diagnostic: A) Try using "ide reset" followed by "ide init" to clear the error. B) The I/O address of the register which could trigger this error at "ide init" is located at 0x1f1. Try using "io inb 1f1" and "io outb 1f1 <value>" to diagnose further. Table Continued 226 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 17, sub-code 0x10 (data) IDE_FAILURE "Internal Drive Failure" A disk SMART threshold was triggered. an imminent boot drive failure. This would indicate Resolution: Replace the IDE or SATA boot drive. Diagnostic: The data value may be used to determine the specific SMART field which caused the alert. Examples: 0 - Unknown 1 - Raw Read Error Rate 2 - Throughput 3 - Spinup Time 4 - Start / Stop Count 5 - Reallocate Sector Count 6 - Read Channel Margin 7 - Seek Error Count 8 - Seek Time 9 - Poweron Hours 10 - Spin Retry Count 11 - Calibration Retry Count 12 - Power Cycle Count 192 - Poweroff Retract Count 193 - Load Cycle Count 194 - Temperature Celsius 195 - Hardware ECC Recovered 196 - Reallocate Event Count 197 - Current Pending Count 198 - Offline Scan UE Count 199 - UDMA CRC Error Count 200 - Write Error Count 201 - Off Track Error Count 202 - DAM Error Count 203 - Run Out Cancel 204 - Raw Read Error Count 205 - Thermal Asperity Count 207 - Spin High Current Count 208 - Spin Buzz Count 209 - Offline Seek Performance The "ide smart status" command may be used to display the current SMART status fields. Fatal error: Code 17, subcode 0x11 (0) IDE_FAILURE "Internal Drive Failure" IDE SMART self-test failed. The drive failed to finish a built-in self-test. Resolution: Replace the IDE or SATA boot drive. Table Continued Error codes—HPE 3PAR OS 3.2.2 227 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 17, subcode 0x12 (0) Description IDE_FAILURE "Internal Drive Failure" Drive failed to collect SMART data. The data is vital for the drive to determine SMART trigger. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x13 (0) IDE_FAILURE "Internal Drive Failure" Drive refused to accept SMART commands. Resolution: Replace the IDE or SATA boot drive. Diagnostic: Use "ide smart enable" to turn on SMART before issuing more SMART commands. Fatal error: Code 17, subcode 0x14 (0) IDE_FAILURE "Internal Drive Failure" The SMART command issued to drive has incorrect syntax. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x15 (0) IDE_FAILURE "Internal Drive Failure" The SMART commands failed to write or read attributes. Resolution: Replace the IDE or SATA boot drive. Non-fatal error: Code 17, sub-code 0x16 (0) IDE_FAILURE "Internal Drive Failure" No IDE device was found. Resolution: A) Install or replace the IDE or SATA drive. B) Replace the node motherboard. Fatal error: Code 17, subcode 0x18 (0) IDE_FAILURE "Internal Drive Failure" The IDE controller failed the BIOS interrupt test, possibly due to a bad drive. See Code 17, sub-code 0x0 for resolution information. Table Continued 228 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 17, sub-code 0x19 (0) IDE_FAILURE "Sequential DMA read timed out" DMA xfer error code xxxx The drive DMA test failed due to a timeout. Although each sequential DMA read operation is succeeding, the total test time was exceeded. The likely cause of this failure is a drive which is having to perform a large number of relocations due to failed sectors, or a drive interface failure which only shows up under stress. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x20 (0) IDE_FAILURE "Internal Drive Failure" Drive did not return status to host after a command within a reasonable amount of time. Resolution: Replace the IDE or SATA boot drive. Fatal error: Code 17, subcode 0x21 (rpm) IDE_FAILURE "Internal Drive Failure" *** Error: Boot drive is not a Solid State Disk (SSD). This error occurs when the disk drive for a harrier system is not a SSD disk drive type. Resolution: A) Replace the SATA drive with a SSD drive. Fatal error: Code 17, subcode 0x22 (disk size) IDE_FAILURE "Internal Drive Failure" *** Error: Disk Size (XXX.X GB) is less than 128 GB. This error occurs when we have 32 GB or less of cluster memory and the disk drive is less than 128 GB. This is because the disk is not large enough for the memory dumps if the node panics. Resolution: A) Replace the SSD drive with a drive of at least 128 GB. Table Continued Error codes—HPE 3PAR OS 3.2.2 229 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 17, subcode 0x23 (disk size) IDE_FAILURE "Internal Drive Failure" *** Error: Disk Size (XXX.X GB) is less than 256 GB. This error occurs when we have more than 32 GB of cluster memory and the disk drive is less than 256 GB. This is because the disk is not large enough for the memory dumps if the node panics. Resolution: A) Replace the SSD drive with a drive of at least 256 GB. B) Reduce cluster memory to 32 GB or less. Fatal error: Code 17, subcode 0x30 (0) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. Resolution: Replace the IDE or SATA boot drive. Non-fatal error: Code 17, sub-code 0x40 (xxxxxxxx) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. xxxxxxxx, AHCI Port Status register, for lab debug Resolution: TODO Non-fatal error: Code 17, sub-code 0x41 (xxxxxxxx) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. xxxxxxxx, AHCI Port Error register, for lab debug Resolution: TODO Non-fatal error: Code 17, sub-code 0x42 (xxxxxxxx) IDE_FAILURE "Internal Drive Failure" Drive returned an error status after command execution. xxxxxxxx, AHCI Port TFD register, for lab debug Resolution: TODO Table Continued 230 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 18, sub-code zzzz (0) BIOS_INT_UNIMPLEMENTED"BIOS Int Unimplemented" *** Real-mode BIOS interrupt: xxxx(error: yyyy) This error most commonly indicates a bad or missing boot area of the USB disk. Customer Service node-disks or node spares (FRUs) might not be shipped with an operating system.Attempting to boot from one of these disks without first installing the system software might produce this error message.From the Whack prompt, use the "boot net install" command to install the system software. In order for Linux to boot, LILO must load the kernel image. It needs assistance from the BIOS in order to perform this task. Linux also acquires some information from the BIOS using 16 bit BIOS interrupts. CBIOS automatically accepts and emulates traditional 16 bit BIOS interrupts to support these methods. If LILO or Linux triggers an interrupt which is not supported by CBIOS, this possibly fatal error will result. There are many obsolete BIOS facilities which are not supported by CBIOS.In some cases, the system boot may be able to continue after this error. The sub-code and minor code indicate the specific BIOS interrupt called and the eax register parameter value. This information may be useful to Engineering. Resolution: A) Reboot.Attempt to reproduce the problem. B) Reinstall system software on the disk. This may require a "boot net install" in order to reinstall the operating system. C) There may be a bug in the OS you are using or it has been misconfigured. Confirm this version of the OS has been verified to work on a 3PAR node board.Or, temporarily swap system disks with a known good system disk. D) Replace the boot drive and reinstall the system software. E) Replace the node motherboard. Diagnostic: A) Look up the displayed Real-mode BIOS interrupt number in a BIOS index to determine the facility the software is requesting.This may provide you a clue as to the cause. Table Continued Error codes—HPE 3PAR OS 3.2.2 231 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description B) Use the Arium to single step through the return from the 16 bit handler to the originating code to determine what code is involved with the unimplemented BIOS operation. Fatal error: Code 19, subcode 0x0 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" Booting from SATA IDE... No IDE or USB drives present or boot sector is invalid. or Booting from SATA IDE (bootdev)... No IDE drive present or boot sector is invalid. or Booting from PATA IDE... No IDE drive present or boot sector is invalid. or Booting from USB... No USB drive present or boot sector is invalid. The IDE (PATA or SATA) or USB Flash disk is used for booting the operating system. This error indicates no a drive was found during during a hardware probe, but it was found to not be boootable. Resolution: A) Cycle power on the node. B) Verify disk power and data cables are connected to both the drive and the motherboard.The red stripe on the IDE data cable must be oriented closest to the power connector on the drive. C) Replace the disk power cable and/or data cable. D) Replace the drive. E) Replace the node motherboard. Diagnostic: A) Reset and enter Whack with ^W after the PCI bus scan but before the IDE probe. You should be able to use the "ide init" command to probe for a disk.Minimal output should include drive Capacity and Geometry (C/H/S: cylinder/head/sector). B) If the above information is available, use the "ide read" command to read a sector into CPU memory and verify it was read.Example: Whack> ide read 1000 0 1 Whack> d 1000 200 You should see the contents of sector 0, which (with a previously initialized node disk) will include the string "LILO" starting at byte offset 6. Table Continued 232 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 19, subcode 0x1 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" IDE TIMEOUT waiting for DRDY The IDE disk is used for booting the operating system. This error indicates there was a problem communicating with the IDE controller, most likely due to a missing IDE hard drive, a disconnected cable, or a failed IDE hard drive. See Code 19, sub-code 0x0 for resolution information. Fatal error: Code 19, subcode 0x2 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" IDE TIMEOUT waiting for DRQ The IDE disk is used for booting the operating system. This error indicates that a command was issued to the IDE disk (read sectors) but the drive controller did not report back with the data within a reasonable amount of time.This may be caused by a failed sector or IDE controller failure. See Code 19, sub-code 0x0 for resolution information. Fatal error: Code 19, subcode 0x3 (0) CANT_READ_BOOT_BLOCK"Can't Read Boot Block" IDE ERROR reading sector xxxx The IDE disk is used for booting the operating system. This error indicates that a command was issued to the IDE disk (read sectors) but the drive controller reported that there was a error in reliably retrieving the requested sectors. This error may be caused by a failed sector or IDE controller failure. See Code 19, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 233 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 20, subcode 0x0 (0) Description AP_INIT_FAILURE "AP Init Failure" *** Error: Failed to deliver startup message to CPU xxxx or *** Error: Errors in APs starting up. If a board has more than a single CPU, only one CPU comes out of power-on executing code. The other waits in a halted state for an AP message from the bootstrap processor. All MP-capable Pentium processor has an onboard Advanced Programmable Interrupt Controller called the Local APIC (there is a complementary component called the IOAPIC located on the motherboard). Once the bootstrap processor has completed all node board initialization and testing, it starts up each application processor (which in Intel terms is defined as any processor other than the initial bootstrap processor).Each AP then does a brief identify, verify, and microcode update. In the above case, if the local APIC fails deliver an AP startup to the other processor within a reasonable amount of time, this error will result. In a single CPU system this error should not occur because an earlier probe should identify no AP processor is present. If the Local APIC cannot reliably deliver a message over the IOAPIC, then it is probably not safe to ignore this error by pressing ^C. Resolution: A) Reseat both processors in their sockets. B) Replace each processor individually. Do not bother with downgrading to a single processor system since this is a multiprocessor startup issue. The problem processor will not be apparent with a single processor configuration. C) Replace the node motherboard. Diagnostic: A) Use Arium as bootstrap processor and verify that APIC message is being delivered to the bus. B) Use Arium as application processor and verify that APIC message is delivered from the IOAPIC on the motherboard. The application processor should then start executing code at the default APIC address of 0x30000 (FIRST_SMM_BASE). Table Continued 234 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 20, subcode 0x1 (0) Description AP_INIT_FAILURE "AP Init Failure" *** Error: Startup message successfully sent to CPU xxx, no response After an AP startup message has been delivered to the application processor through the IOAPIC, the bootstrap processor waits for an indication the AP has started. If the indication is not received before a reasonable timeout, this error is given. It should be ok to ignore this message by pressing ^C and continue with further BIOS diagnostics. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 20, subcode 0x2 (0) AP_INIT_FAILURE "AP Init Failure" *** Error: CPU xxxx failed to complete initialization. Once the application processor (AP) has started initialization, it sets a flag that the bootstrap processor can use to determine when the bootstrap processor has completed. If the AP remains in the AP_INIT_START state too long, this fatal error is displayed.It is probably not safe to resume after this error since the AP may be off executing errant code or interfering with bootstrap processor bus cycles. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 20, subcode 0x3 (0) AP_INIT_FAILURE "AP Init Failure" *** Error: POST failure on CPU xxxx: yyyy *** Error: CPU xxxx initialization failure. The application processor (AP) previously failed to complete a Built In Self Test (BIST). This is likely due to a bad processor. Resolution: A) Replace the application processor. Table Continued Error codes—HPE 3PAR OS 3.2.2 235 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 20, subcode 0x4 (0) Description AP_INIT_FAILURE "AP Init Failure" *** Error: Invalid CPU for CPU xxxx, error code: yyyy *** Error: CPU xxxx initialization failure. During application processor (AP) initialization, it verifies that the CPU model, stepping, and clock multiplier which is being initialized matches those values of the bootstrap processor.If they do not match, this error will result. Resolution: A) Since the processors are possibly mismatched, remove the heatsink on both and verify that the CPU model and stepping are identical. See Code 20, sub-code 0x0 for more resolution information. Fatal error: Code 20, subcode 0x5 (0) AP_INIT_FAILURE "AP Init Failure" *** Error: More than wwww CPUs in system. *** Error: CPU xxxx initialization failure. The currently supported node board hardware configuration is a maximum of two physical processors. The BIOS uses this knowledge to limit the possibility of repeat initialization of the application processor (AP).If this message occurs, it may be due to a variety of hardware problems, but most suspect is the application processor. See Code 20, sub-code 0x0 for resolution information. Table Continued 236 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 21, subcode 0x0 (0) Description SMI_SETUP_ERROR "SMI Setup Failure" *** SMI setup error: Not expecting to install a vector on CPU xxxx Intel processors support an interrupt level called SMI (System Management Interrupt) which is used for hardware management (usually by the BIOS).Events such as power management and hardware errors usually trigger an SMI. When an SMI is triggered, the system enters SMM (system management mode).In a multiprocessor system, both processors are usually triggered by an SMI at the same time. Since both processors may attempt to service an SMI at the same time, each processor must have a unique stack area where to dump processor context.SMI setup configures each processor individually with a unique stack address for SMI handling. This particular error indicates that the SMI setup handler has detected a stack setup SMI, yet one was not expected (because one had already been set up or CPU initialization had not yet reached the point of SMI setup). The bootstrap CPU delivers the setup SMI to itself and to the application processor.This error could be caused by a faulty CPU or motherboard. The CPU which reports the setup error may not be the one at fault. Resolution: A) Pull one processor at a time to determine if the problem is reproducible with a single CPU. B) Swap CPUs to see if the exact problem moves with CPU. If not, it may be the motherboard. C) Individually replace both CPUs. D) Replace the node motherboard. Diagnostic: A) Use Arium as bootstrap processor and verify that the SMI is being delivered. Fatal error: Code 21, subcode 0x1 (0) SMI_SETUP_ERROR "SMI Setup Failure" *** SMI setup error: CPU xxxx not found in CPU table During SMI setup, each processor in turn receives an SMI and then performs stack initialization. Prior to the SMI setup, all application processors wait in a halted state for an APIC message to identify and download microcode. If the processor performing an SMI setup detects that it had not previously executed and added its CPU ID to the system table, then this fatal error will be displayed. See Code 20, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 237 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 21, subcode 0x2 (0) SMI_SETUP_ERROR "SMI Setup Failure" *** SMI setup error: CPU xxxx did not respond During SMI setup, each processor in turn receives an SMI and then performs stack initialization. This error indicates that the bootstrap processor issued an SMI through the APIC and it was not processed by the targeted processor. This indicates that either SMIs are not being delivered properly, or that the targeted processor may be defective. See Code 20, sub-code 0x1 for resolution information. Fatal error: Code 22, subcode 0x0 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `cbios_to_os_message' test, expected xx but got yy CBIOS provides service to the 3PAR kernel through a special command queue.Responses are returned to the OS through another queue, which is tested during BIOS initialization. Sub-code 0x0 indicates that the CBIOS to OS queue did not pass the built-in test. Resolution: A) Pull one processor at a time to determine if the problem is reproducible with a single CPU. B) Swap SDRAM with good SDRAM. C) Update CBIOS to the latest version. D) Replace the node motherboard. Fatal error: Code 22, subcode 0x1 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `os_read_message_test', failed to read message This error indicates that the CBIOS to OS queue test failed to acquire a message it previously sent. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 22, subcode 0x2 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `os_read_message_test': expected: uuuu vv `ww' but got: xxxx yy `zz' This error indicates that the CBIOS to OS queue test failed because the message received did not match the message sent. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 22, subcode 0x3 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: In `os_read_message_test', expected no more data This error indicates that the CBIOS to OS queue test failed because there were more items in the queue than those sent. See Code 20, sub-code 0x0 for resolution information. Table Continued 238 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 22, subcode 0x4 (0) Description CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: Couldn't send simulated message from OS to CBIOS, code == xx This error indicates that the OS to CBIOS queue test failed. The minor code will indicate to an engineer what went wrong. See Code 20, sub-code 0x0 for resolution information. Fatal error: Code 22, subcode 0x5 (0) CBIOS_OS_QUEUE_ERROR"CBIOS OS Queue Failure" *** Error: Inconsistent queue: queue_base == ww, queue_limit == xx queue_inp = yy, queue_otp = zz This error indicates that the CBIOS to OS queue test failed because the queue pointers became corrupt. See Code 20, sub-code 0x0 for resolution information. Non-fatal error: Code 23, sub-code 0x0 (0) FLASH_CRC_ERROR "Flash CRC Failure" CRC mismatch for failsafe CBIOS Upon startup, CBIOS computes a strong CRC over all executable code and data stored in the flash.This is done to guard against flash corruption which also ensures reliable system initialization and testing. This specific sub-code indicates that a CRC error was detected in the failsafe component of CBIOS. The majority of the failsafe is only executed if corruption is detected in the main CBIOS. Resolution: A) Try pressing ^C to resume. Perform a flash update as soon as possible.If flash updating under Linux, make sure to specify the 'failsafe' option to update the failsafe area as well. B) If the flash update is successful, but you still get a CRC error, verify that your flash image is intact. The Linux flash utility does this automatically using the same strong CRC algorithm as the BIOS uses. C) Replace the node motherboard. Diagnostic: A) Use the Whack "net tftp" command to download an identical image to that which is in flash. Use the Whack "mem compare" command to locate bytes which differ so that you may examine those values with "d <addr>" B) If Whack is not available, use the Arium to look at flash address space for defects. It may be a stuck, floating, or bridged address or data line. C) Replace the flash part. Table Continued Error codes—HPE 3PAR OS 3.2.2 239 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 23, sub-code 0x1 (0) Description FLASH_CRC_ERROR "Flash CRC Failure" Invalid entry point for full CBIOS Boot with clustering disabled and update flash immediately! Prior to starting up the non-failsafe (full diagnostic) CBIOS image, the failsafe CBIOS performs some consistency checks over the image. This error indicates corruption was detected in the entry point to the main routine of the full CBIOS. If you are have recently installed a new CBIOS which is larger than the previous, it is possible to get this error because the failsafe BIOS present cannot properly verify the larger size BIOS. Resolution: A) Try pressing ^C to resume. Perform a flash update as soon as possible.Boot with clustering disabled by typing "tpd nokmod" at the LILO prompt. Once the node has booted, login as root and use the flash command. Example: # flash /opt/tpd/bios/bios-1.9.4 Upon completion of the flash update, reboot and observe console messages to ensure the CRC error no longer occurs. B) If the flash update is successful, but you still get this error, verify that your flash image is intact. The Linux flash utility does this automatically using the same strong CRC algorithm as the BIOS uses. C) Replace the node motherboard. Diagnostic: A) If Whack is not available, use the Arium to look at flash address space for defects. It may be a stuck, floating, or bridged address or data line. B) Replace the flash part. Fatal error: Code 23, subcode 0x2 (0) FLASH_CRC_ERROR "Flash CRC Failure" Invalid magic for full CBIOS Prior to starting up the non-failsafe (full diagnostic) CBIOS image, the failsafe CBIOS performs some consistency checks over the image. This error indicates the failsafe BIOS could not find a proper header record for the full CBIOS. See Code 23, sub-code 0x1 for resolution information. Table Continued 240 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 23, subcode 0x3 (0) FLASH_CRC_ERROR "Flash CRC Failure" CRC mismatch for full CBIOS Prior to starting up the non-failsafe (full diagnostic) CBIOS image, the failsafe CBIOS performs a strong CRC over the full CBIOS image to verify the image's integrity. This error indicates the full CBIOS had a CRC failure. See Code 23, sub-code 0x1 for resolution information. Fatal error: Code 23, subcode 0x4 (0) FLASH_CRC_ERROR "Flash CRC Failure" Failsafe CBIOS is now enabling the full CBIOS ... The full CBIOS either detected an error or user input (the 'f' key) which forced it to return to the failsafe BIOS. If the user did press the 'f' key, then press ^C to resume startup under the failsafe BIOS. If the user did not press the 'f' key, browse prior messages to learn of a failure which may have caused this error. Resolution: A) If the error was not the result of a keystroke, try pressing the 'n' key at BIOS startup to clear any initialization skips.It may be recorded in NVRAM to skip the full BIOS version and always execute the failsafe. See Code 23, sub-code 0x1 for more resolution information. Non-fatal error: Code 23, sub-code 0x10 (bbxxyyzz) FLASH_CRC_ERROR "EOS: Repairing Main BIOS" The EOS Main BIOS image in SPI has failed to boot and the FPGA watchdog has reset the node to boot from the failsafe BIOS. The failsafe BIOS has detected a bad CRC in the main BIOS region of flash and is attempting to automatically re-flash that region from disk. The data field contains the build (bb) and version (xx.yy.zz) of the Main BIOS that failed to boot. Table Continued Error codes—HPE 3PAR OS 3.2.2 241 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 23, subcode 0x11 (bbxxyyzz) Description FLASH_CRC_ERROR "EOS: Main BIOS Corrupt" The EOS Main BIOS image in SPI has failed to boot and the FPGA watchdog has reset the node to boot from the failsafe BIOS. The failsafe BIOS has detected a bad CRC in the main BIOS region of flash.The failsafe BIOS has also detected five or more attempts to automatically recover the Main BIOS within the past two hours and has stopped attempting automatic recovery. The data field contains the build (bb) and version (xx.yy.zz) of the Main BIOS that failed to boot. Fatal error: Code 24, subcode 0x0 (ptr) TURD_EXCEEDED_LIMIT "TURD Exceeded Limit" *** Error: MP turd exceeded 0x100000 The BIOS presents to the operating system a set of tables which describe the hardware present in the system. These tables have a rigid structure for each type of device.If the CBIOS configuration structure becomes corrupt, this error may result when the TURD structures are initialized for the operating system. A consistency check ensures the TURD area does not go beyond 1 MB (which is the base address where the operating system normally begins using main memory).The data to this error is the pointer address reached, and will be greater than 0x100000.ptr is the value which exceeded 0x100000. Resolution: A) Remove cards from all PCI slots. If the error no longer occurs, it may be a hardware failure on one of cards. B) Replace the node motherboard. Diagnostic: A) Look at memory starting at 0x000f0000. 0x5f504d5f is the magic number of the first first TURD (the MP Configuration table). B) Turn on PRINTING_TURD and DEBUG_APIC compile flags. Table Continued 242 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 24, subcode 0x1 (0) TURD_EXCEEDED_LIMIT "TURD Checksum Failure" *** Error: MP table checksum failed - stopping table build The BIOS presents to the operating system a set of tables which describe the hardware present in the system. In this case, the BIOS detected that one of the tables had a bad checksum. Resolution: A) Remove cards from all PCI slots. If the error no longer occurs, it may be a hardware failure on one of cards. B) Replace the node motherboard. Fatal error: Code 24, subcode 0x2 (0) TURD_EXCEEDED_LIMIT "TURD Exceeded Limit" *** Error: Too many MP table entries - stopping table build The BIOS presents to the operating system a set of tables which describe the hardware present in the system. In this case, the BIOS detected that it had added too many entries to the table, likely because too many PCI devices are present in the system. This error is likely due to an earlier PCI failure. Resolution: A) Remove cards from all PCI slots. If the error no longer occurs, it may be a hardware failure on one of cards. B) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 243 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 25, subcode 0x0 (0) Description PROM_FAILURE"PROM Failure" The node board has two different Serial EEPROM devices used for storing persistent board information. One PROM device is located on the I2C bus.It stores node board manufacturing, assembly, serial number, and error message log information. The second PROM device is connected through the Intel 82559ER ethernet controller. It stores ethernet controller information such as initialization state and the hardware MAC address. PROM checksum: FAIL The PROM which stores node board manufacturing, assembly, serial number, and error message log information does not have a valid checksum.If the PROM has not yet been initialized or if it has become corrupt, you may see this error. Resolution: A) Press ^W to enter Whack and use either "prom init" or "prom edit" to correct this error. B) If the information looks correct with "prom id" then try using "prom checksum" to rewrite the checksum. C) Replace the node motherboard. Diagnostic: A) Use the Whack "d prom <addr>" command to display PROM contents. Use the Whack "c prom <addr>" command to change PROM contents. Look for a pattern in order to determine if the error is due to the device's connection with the motherboard or a hardware failure within the Serial PROM. Table Continued 244 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 25, subcode 0x1 (0) PROM_FAILURE"PROM Failure" Ethernet 0 PROM checksum: FAIL The PROM which stores ethernet controller information does not have a valid checksum.If the PROM has not yet been initialized or if it has become corrupt, you may see this error. Resolution: A) Press ^W to enter Whack and use "prom id" to verify the other PROM is valid.If not, first use "prom init" or "prom edit" to set the PROM information. If the PROM information appears valid, use "prom mac" to reprogram the Ethernet MAC address and checksum. B) Try flushing out a correct checksum. Note: You must first select the device with an error using the "eth dev" command. Example: Whack> eth dev 1 Whack> eth checksum C) Replace the node motherboard. Diagnostic: A) Try programming a custom MAC address. Example: "prom mac 00:02:AC:00:00:43" B) Use the Whack "d eth <addr>" command to display PROM contents. Use the Whack "c eth <addr>" command to change PROM contents. Look for a pattern in order to determine if the error is due to the device's connection with the motherboard or a hardware failure within the Ethernet PROM. Table Continued Error codes—HPE 3PAR OS 3.2.2 245 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 25, sub-code 0x2 (0) Description PROM_FAILURE"PROM Failure" Ethernet MAC xx:xx:xx:xx:xx:xx mismatches PROM: yy:yy:yy:yy:yy:yy The "prom mac" command may fix this. This error indicates the MAC address stored in the onboard Ethernet controller's PROM does not match that which can be computed from the board revision and serial number stored in the node's PROM. This mismatch suggests that one or the other PROM may contain corrupt contents. If the ethernet MAC address was purposely set to an address (see "prom mac" command), then this check may be overridden by setting the NVRAM "oddmac" flag. Example: Whack> set perm oddmac Resolution: A) Look for a prior message indicating an invalid board type or check the banner to ensure the board type and serial number are correct for this node. If either is not correct, use the 'prom edit' command to repair the corruption. B) Use the "prom mac" command to reprogram the MAC address in the ethernet controller's PROM. C) Replace the node motherboard. Diagnostic: A) Determine if the cause is due to a failing node PROM or ethernet controller PROM. Use the "db prom 0 20" command to display PROM contents and compare with expected values. Example: Whack> dbz8 prom 0 20 prom 0000: 00 04 09 20 10 03 04 35 . ...5 prom 0008: 30 53 4f 4c 01 10 00 00 0SOL.. prom 0010: 00 76 ff ff ff ff ff ff v...... prom 0018: ff ff ff ff c1 1f a4 5e .......^ Replace node PROM if it is defective. B) Use the "db eth 0 20" command to display ethernet PROM contents and compare with expected values. Example: Whack> dbz8 eth 0 20 eth 0000: 00 02 ac 14 00 76 03 01 ... v.. eth 0008: ff ff 01 00 01 07 00 00 ... .. eth 0010: 10 00 04 03 40 48 00 00 . ..@H eth 0018: 86 80 00 00 ff ff ff ff .. .... Replace ethernet PROM if it is defective. Table Continued 246 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 25, sub-code 0x3 (0) Description PROM_FAILURE "PROM Failure" During initialization, CBIOS checks the prom for magic number. If the magic number test fails. This non-fatal logs when the magic number check fails. Resolution: A) On EOS platforms, the midplane, node type, slot id may need to be reconstructed with prom edit. The Ethernet MAC and PROM magic number may also need to be reconstructed. (Bug 82094) B) Previous platforms should be reinitialized and reconstructed automatically Diagnostic: A) Use "db i2c 2.a6.0 100" to view the contents of this region.Typically only the first 32 bytes are affected. Non-fatal error: Code 25, sub-code 0x4 (aabbccdd) PROM_FAILURE"PROM Failure" Board Spin value is invalid. fix this. The "prom edit" command may This error indicates the board spin value in the prom record is not in the proper range. The range of the board spin byte is 0x01 to 0x16. If the board spin number is out of this range, then this error will occur. NOTE: On Tinman, the board spin field is not used as board spin, so this field will always be 0x17.On Tinman, this is NOT flagged as a error. If the board spin field is not valid, then the BIOS used the board revision field. This is a two character field that must be "01" to "09", then "A0" to "A9", then "B0" to "B9" etc. If a character (A-Z) is in the secord byte or a non zero number (1-9) is the first character, then this is an error. In the data field, aa is the board spin value, bb is the calculated board revision, cc is the first character in the rev field, and dd is the second character in the rev field. Resolution: A) Use "prom edit" to fix/verify the board spin field. B) Use "prom edit" to fix the board revision field. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 247 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 25, sub-code 0x5 (0) Description PROM_FAILURE"PROM Failure" EOS node prom value is invalid. The "prom edit" command may fix this. This error indicates EOS node prom value in the prom record is not in the proper range. The node type and midplane type value in prom should be programmed correctly with prom edit command. Resolution: A) Use "prom edit" to fix/verify the midplane field. B) Use "prom edit" to fix/verify the node type field. Non-fatal error: Code 25, sub-code 0x6 (0) PROM_FAILURE"PROM Failure" EOS Node ID in Prom and Slot ID do not match. edit" command may fix this. The "prom This error indicates EOS Node ID prom value in the prom record does not match the Slot ID read from the fpga. The Node ID value in prom should be programmed correctly with prom edit command. Resolution: A) Use "prom edit" to fix/verify the Node ID field. Non-fatal error: Code 26, sub-code 0x1 (ethdev) ETH_FAILURE "Ethernet Failure" eth0 device self test: FAIL All tests: xxxx (timeout) During initialization, CBIOS has the ethernet controller perform an internal test to verify correct operation. If the ethernet controllerdoes not respond within a reasonable amount of time, this error will be displayed. "ethdev" indicates the PCI Slot in device is located.This is an ASCII PCI slot 0. If the ethernet device motherboard, then ethdev will have which the failed ethernet value, so 0x30 indicates is located on the node a value of 0x00. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Verify the 82559ER shows up in a PCI scan. Use the Whack "pci find 8086" command. It should display the 82559ER Ethernet controller. B) Use the Whack "eth test" command to repeat the test. Make sure that CBIOS initialization has past the point of PCI scan.Use Whack "loop ffff eth test" to repeat in a loop. Table Continued 248 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 26, sub-code 0x2 (ethdev) ETH_FAILURE "Ethernet Failure" eth0 device self test: FAILxxxx yyyy If the ethernet controller fails its internal test, this error will be displayed. Since this is an internal test, it is likely the ethernet controller itself which has failed. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "eth test" command to repeat the test. Make sure that CBIOS initialization has past the point of PCI scan.Use Whack "loop ffff eth test" to repeat in a loop. Non-fatal error: Code 26, sub-code 0x3 (0) ETH_FAILURE "Ethernet Failure" No ethernet devices available for loopback test This error indicates that no ethernet devices could be found or initialized on the node. This is possibly the result of a hardware failure. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "eth test" command to make sure that the low level test passes. B) Try using "net dhcp" in an environment that has a DHCP server to see if the node can send and receive packets. If so, then this error is likely caused by incorrect BIOS code. Non-fatal error: Code 26, sub-code 0x4 (0) ETH_FAILURE "Ethernet Failure" No loopback connections were found. An external loopback plug is required if this node has only one ethernet port. A crossover cable is required if this node has more than a single ethernet port. Resolution: A) Make sure the ethernet loopback plug is in the ethernet connector (you should see link status lights illuminated).In the case of a node having two ethernet ports, make sure a crossover cable is connected between the ethernet ports. B) Cycle power on the node. C) Replace the node motherboard. Diagnostic: A) This problem is most likely caused by a bad connector or bad connection to the loopback plug. Make sure TX+ makes a circuit to RX+ and TX- makes a circuit to RX- on the PHY. B) Try plugging into a normal ethernet to see if it can talk to a DHCP server "net dhcp" C) Try using "net loopback" to test the ethernet port using the internal PHY loopback. Table Continued Error codes—HPE 3PAR OS 3.2.2 249 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 26, sub-code 0x5 (slotid) ETH_FAILURE "Ethernet Failure" eth2 loopback PHY internal: FAIL This error indicates that the internal loopback of the PHY did not correctly loop back packets. If the device being tested is onboard the node (82559ER or 82551ER), then this is a failure. Some plug-in PCI boards (such as 82557) do not fully support PHY loopback. Those devices will cause the following warning: eth2 loopback PHY internal: Unavailable No error stop will occur in the case of a PHY not supporting internal loopback. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the "pci probe" command to match up the ethernet devices with which one has failed. B) Try using an external loopback to see if the results are the same. If the same, then try debugging using a scope. If the external loopback works, then it may be that the PHY loopback just does not work in this device. Non-fatal error: Code 26, sub-code 0x6 (slotid) ETH_FAILURE "Ethernet Failure" eth0 sends to eth1 but cannot receive from it This is an unusual error in that one ethernet device is able to reliably receive packets from the other, but the opposite is not true. Resolution: A) Run the test again.If the nodes are attached to a hub, the failure may be due to another ethernet node flooding the network. B) Cycle power on the node. C) Ensure that there is no a switch between the ethernet ports.A switch may prevent the test from functioning properly if the MAC address of an interface is in use elsewhere or the switch is really an IP router. D) Ensure that there is no a switch between the ethernet ports.A switch may prevent the test from functioning properly if the MAC address of an interface is in use elsewhere or the switch is really an IP router. Diagnostic: A) Test against a plug-in PCI ethernet card to isolate which ethernet interface is not functioning. Table Continued 250 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 26, sub-code 0x7 (slotid) ETH_FAILURE "Ethernet Failure" eth0 loopback wwwww: FAIL - receive timeout (xx seconds) This error indicates the ethernet device did not successfully receive the loopback pattern sent to test the ethernet device's tranceiver. The failure to receive a loopback pattern usually means the ethernet device has failed. "ethdev" indicates the PCI Slot in device is located.This is an ASCII PCI slot 0. If the ethernet device motherboard, then ethdev will have The following to see. If this not happened: eth0 loopback eth0 loopback eth0 loopback eth0 loopback eth0 loopback which the failed ethernet value, so 0x30 indicates is located on the node a value of 0x00. are normal test results that you would expect error occurs, then one of the following has All zeros: PASS All ones: PASS Walking ones: PASS Walking zeros: PASS Random pattern: PASS This error indicates that within 100 packets successfully transmitted, there were no packets successfully received. Resolution: A) Cycle power on the node. B) Unplug the network cable and run the test again. If the node is attached to a hub, the failure may be due to another ethernet node flooding the network. This is not very likely. C) If the ethernet device is located in a PCI slot, replace the card. D) Replace the node motherboard Diagnostic: A) Test against a plug-in PCI ethernet card to isolate which ethernet interface is not functioning. Non-fatal error: Code 26, sub-code 0x8 (slotid) ETH_FAILURE "Ethernet Failure" eth0 loopback wwwww: Packet transmit failed This error indicates that the ethernet device was not able to successfully transmit packets.This is really a serious failure, since the ethernet code will under any condition not fail to transmit unless the ethernet device failed to initialize. Resolution: A) Use "eth reset" to reset the ethernet device. B) Cycle power on the node. C) Replace the node motherboard if the failed ethernet device is on the node. Table Continued Error codes—HPE 3PAR OS 3.2.2 251 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 26, sub-code 0x9 (slotid) Description ETH_FAILURE "Ethernet Failure" eth0 loopback wwwww: FAIL - miscompare stuck high=xxxx stuck low=yyyy toggle=zzzz This error is displayed if one of the ethernet tests detects a mismatch between the packet send and the data received. It also includes a diagnostic line which is useful to see in what way the data is different. Resolution: A) Use "eth reset" to reset the ethernet device. B) Cycle power on the node. C) Replace the node motherboard if the failed ethernet device is on the node. Diagnostic: A) You can get complete packet dumps if you wish to manually compare how the data was corrupted. In order to do this, use "net loopback vv" (double verbose). B) If it is a single bit that is failing (or a small number), observe if the bits are pulled high or low. This may assist you in debugging where the hardware is failing, if it is external to the ethernet IC. Non-fatal error: Code 26, sub-code 0xa (slotid) ETH_FAILURE "Ethernet Failure" ethxxx device registers:FAIL Onboard ethernet device did not read valid config from EEPROM. A powercycle might clear this failure if this is a new node. This error indicates the ethernet device failed to initialize properly, probably because it read invalid content from the attached EEPROM device. If this an onboard GigE on the 5000P chipset (Tx00, Fx00, Vx00, Gx00), then it is likely this is the first time the node has ever been powered on. Once the BIOS writes a configuration to the SPI EEPROM attached to the GigE, it is necessary for the board to be power cycled before the GigE device is usable. If the board is not new and you see this failure, then it's likely a component on the node motherboard has failed. Resolution: A) Power cycle the node. B) Replace the node motherboard if the failed ethernet device is on the node. Table Continued 252 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 27, sub-code 0x0 (#) Description TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" Each node board has multiple temperature and voltage sensors and fan RPM sensors which monitor the environment to ensure the temperature, voltage, and fan RPM are within operating tolerances. This directly results in increased reliability of the product. If a temperature or a voltage falls outside a programmed tolerance level, CBIOS will alert the user to this condition. The sub-code displayed reflects the type of (the first) error detected. The data value is a count of the number of temperature/voltage/fan problems detected. A sub-code value of 0x0 indicates a fan RPM problem. A sub-code value of 0x1 indicates a temperature problem. A sub-code value of 0x2 indicates a voltage problem. This particular sub-code indicates a programmed temperature limit has been exceeded. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Verify the limit settings are reasonable. Use the Whack "i2c env" command. The Whack "i2c env defaults" command resets all defaults. C) Verify both power supply fans are spinning freely and that the supply amber failure light is not illuminated.If only a single supply is installed, make sure the second slot either has a fan or is covered. D) Replace the power supply. E) If it's CPU temperature, verify the heatsink is conducting heat well. F) If it's CPU voltage, try swapping out the CPU voltage regulators. G) Replace the node motherboard. Diagnostic: A) Use a voltage probe at appropriate vias to verify correct voltage levels. B) Verify LM87 external temperature sensor line is well connected to the CPU's thermal diode. Non-fatal error: Code 27, sub-code 0x1 (#) TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates a programmed temperature limit has been exceeded. See Code 27, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 253 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 27, sub-code 0x2 (#) Description TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates a programmed voltage limit has been exceeded. See Code 27, sub-code 0x0 for resolution information. Non-fatal error: Code 27, sub-code 0x3 (0) TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates a sensor interrupt test failed. See Code 27, sub-code 0x0 for resolution information. Fatal error: Code 27, subcode 0x4 (0) TEMP_VOLTAGE_FAILURE"Temp/Voltage Failure" This sub-code indicates that a CPU has asserted its THERMTRIP_N signal. This could mean that it has reached its case temperature, that a VRM has failed, or there is a problem with the FPGA. Resolution: A) Check the environmentals. B) Replace the node. Non-fatal error: Code 27, sub-code 0x5 (Shutdown Code =1 or =2) TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Pause" For ShutdownCode = 1: In a system wide over temperature condition, the OS will shut down the system and reboot the nodes. The BIOS will pause the boot in a low power state until the over temperature condition has been cleared for 30 minutes. When in this state BIOS samples critical temperature sensors periodically and displays the current state of those on the system console every few minutes.This delay can be cleared early by a node power cycle. This log entry indicates the start of the BIOS boot pause. For ShutdownCode = 2: This shutdown code indicates an overtemperature faulure of a single node. TPD will flag this failure and shutdown that node. The node will not complete the boot until the unit has been repaired and any issues cleared. To clear the boot halt, reboot the node and use the Whack command "unset tshutdown" before the POST reaches step 35. Table Continued 254 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 27, sub-code 0x6 (0) Non-fatal error: Code 27, sub-code 0x7 (0) Non-fatal error: Code 27, sub-code 0x8 (count) Non-fatal error: Code 27, sub-code 0x9 (index) Description TEMP_VOLTAGE_FAILURE"Temp Shutdown Boot Resume" This sub-code indicates that the critical temperature sensors have been below their thresholds for at least 30 minutes and the BIOS is resuming the boot process. See sub-code 5 for more temperature shutdown information. TEMP_VOLTAGE_FAILURE"Temp Shutdown Override" This sub-code indicates that BIOS skipping a critical temperature boot pause due to a node power cycle. See sub-code 5 for more temperature shutdown information. TEMP_VOLTAGE_FAILURE"No Response" This sub-code indicates that the I2C sensor defined failed to respond on the I2C bus. 'Count' indicates the number of I2C device failures. TEMP_VOLTAGE_FAILURE"High Limit Error" This sub-code indicates that BIOS detected a mathmetical overflow of the 8-bit upper limit register and measurements on this sensor indicated by 'index' may be incorrect.The voltage or temperature limit could not be converted to and stored as an 8-bit value. Contact Engineering for a HW fix. Table Continued Error codes—HPE 3PAR OS 3.2.2 255 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x0 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Cluster Memory ASIC not found for CM Init. The Eagle/Osprey/Harrier ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. The CM exists on all PCI buses in the node. If the CM cannot be found on any of the require PCI bus, this is a serious problem.subcode 0x0 indicates the PCI bus scan did not locate the Cluster Manager. Resolution: A) Cycle power on the node. B) Pull all PCI cards and cycle power on the node. C) Replace the node motherboard. Diagnostic: A) Use "pci find 1590" at the Whack prompt to see if the CM can be located. Since the same data structure is used, it should not show up there either. Use "pci init" which will scan the PCI bus again.If the CM appears now (with "pci find 1590"), it may be a transient problem. B) Examine the output of "pci probe" to determine if other onboard PCI devices are missing. This may help to determine where the failure occurs. For example, if the four PCI bridges do not show, it may be the CIOB at fault. Fatal error: Code 28, subcode 0x0 (1) CM_MEMORY_FAILURE "Cluster Memory Failure" DIMMs did not compare identical DIMM and SPD comparison Failed Not all required CMA DIMMS were found or are exact matches (cma_dimm_unmatched is set) The Harrier2 ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. subcode 0x1 indicates that one or more DIMMs attached to the Harrier2 did not match DIMM0.0.0. Resolution: A) Install the same DIMM type in all CM DIMM slots. B) If your intention was to test with different DIMMs, "set perm cma_dimm_unmatched" and "reset" node. Table Continued 256 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x0 (2) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Harrier2 #0 not found for CM Init. The Harrier2 ASICs are the Cluster Managers which are used for high speed communication between nodes of a cluster. These device are critical for the correct operation of the node software, and hence for operation of the whole cluster. The CM exists on all PCI buses in the node. If the CM cannot be found on any of the require PCI bus, this is a serious problem.subcode 0x2 indicates the PCI bus scan did not locate the Cluster Manager. Resolution: A) Cycle power on the node. B) Pull all PCI cards and cycle power on the node. C) Replace the node motherboard. Diagnostic: A) Use "pci find 1590" at the Whack prompt to see if the CM can be located. Since the same data structure is used, it should not show up there either. Use "pci init" which will scan the PCI bus again.If the CM appears now (with "pci find 1590"), it may be a transient problem. B) Examine the output of "pci probe" to determine if other onboard PCI devices are missing. This may help to determine where the failure occurs. For example, if the four PCI bridges do not show, it may be the CIOB at fault. Fatal error: Code 28, subcode 0x0 (3) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Harrier2 #1 not found for CM Init. See Code 28, sub-code 0x0 (2) for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 257 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 28, sub-code 0x0 (xx04) Description CM_MEMORY_FAILURE "Cluster Memory Failure" DIMM 0: Unsupported Raw Card Type in SPD byte 62 = xx, Using rdimm_control_words[0][]. Where xx, is the hex value that was read from DIMM0 SPD Byte 62. Byte 62 of the DIMM SPD indicates which JEDEC reference design raw card was used as the basis for the module assembly, if any. Bits 4 ~ 0 describe the raw card and bits 6 ~ 5 describe the revision level of that raw card. Special reference raw card indicator, 1F, is used when no JEDEC standard raw card reference design was used as the basis for the module design. Preproduction modules should be encoded as revision 0 in bits 6 ~ 5. The reference card is looked up in rdimm_control_words to determine the index into the rdimm_control_words table. If the value in Byte 62 is not found in the table this error reported. Resolution: A) Replace DIMM with a supported Raw Card Type. Non-fatal error: Code 28, sub-code 0x0 (xx05) CM_MEMORY_FAILURE "Cluster Memory Failure" DIMM 1: Unsupported Raw Card Type in SPD byte 62 = xx, Using rdimm_control_words[0][]. Where xx, is the hex value that was read from DIMM1 SPD Byte 62. Resolution: A) Replace DIMM with a supported Raw Card Type. Table Continued 258 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 28, sub-code 0x1 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" Pairwwww DIMMxxxx: Bad checksum. Got yyyy, SPD said zzzz The memory DIMMs located on the CM riser are called cluster memory. This memory is used to store data destined for the disks (dirty data) as well as data previously read from the disks (cache data). It is also used for communication among the nodes in the cluster. This memory is not required to boot the operating system, but is required for the node to participate in the cluster. Even before the memory is thoroughly tested for proper operation, it must be configured to appear in CM addressable space.Each memory DIMM has a small embedded serial EEPROM which holds DIMM configuration information such as the number of rows, columns, and banks, as well as memory timing.If this serial EEPROM becomes corrupt, data stored in it regarding the DIMM configuration cannot be trusted. So, this EEPROM also contains a checksum which the BIOS verifies is correct before configuring the DIMM. If this checksum does not match the checksum the BIOS computes across the DIMM, this error will result. You should look at prior output to determine if there were I2C errors. These errors suggest a problem with riser installation. The DIMM number is logged in the Data field of the Fatal Error. Resolution: A) Reseat Cluster Memory riser card(s). B) Reseat Cluster Memory DIMMs. C) Replace Cluster Memory DIMMs in pairs to ensure replacement parts are matched. P4-Eagle and PIII-Eagle DIMM Pairs are always located four riser positions apart. For example, if you number the slots from the top, Pair 0 is at positions 3 and position 7 (top). Pair 1 is at positions 0 (bottom) and position 4. Pair 2 is at positions 2 and position 6. Pair 3 is at positions 1 and position 5. Ironman (Tclass) and Tinman (Fclass) sets are always in sets of three. The DIMMs are set as "DIMM C.S" as in Channel then set. There are two riser cards, one for channel 0 and one for channel 1 and 2. Set 0 is DIMM 0.0, 1.0, 2.0 Set 1 is DIMM 0.1, 1.1, 2.1 Set 2 is DIMM 0.2, 1.2, 2.2 Titan and Atlas have 4 DIMM sets on the motherboard. Set 0: DIMM 0.0 and 1.0 Set 1: DIMM 0.1 and 1.1 Set 2: DIMM 2.0 and 3.0 Set 3: DIMM 2.1 and 3.1 D) Replace the Cluster memory riser(s). E) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 259 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Diagnostic: A) The Cluster Memory DIMMs appear on the I2C bus at 2.a0 through 2.ae. Use the Whack "d i2c" command to display the DIMM serial EEPROM contents to determine if there is a pattern. Example (DIMM 5): Whack> d i2c 2.aa.0 Fatal error: Code 28, subcode 0x2 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Pairww DIMMxx (yyyy): 'zzzz' read failed Where xxxx is one of: row address, column address, module rows, cas latency3, refresh, banks, cas latency2, cas latency1, ras precharge, act_to_rw, act_to_deact, ras cycle, write_to_deact, density, frequency, DIMM type This error indicates that a Cluster Memory DIMM was detected but that the Serial EEPROM present on the DIMM could not be reliably read. The DIMM number is logged in the Data field of the Fatal Error. See Code 28, sub-code 0x1 for resolution information. Non-fatal error: Code 28, sub-code 0x3 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Pairqq DIMMtt (uuuu): vv != DIMMww (xxxx): yy zzzz This error indicates the BIOS detected the SDRAM DIMMs in the cluster memory bank pair are of a different type. One DIMM number of the mismatched pair will be logged in the data field of the Fatal Error. Resolution: A) Ensure both DIMMs in the pair are identical. Note that two DIMMs may have the same capacity but have different number of rows, columns, or banks. The DIMM configuration must exactly match. If the DIMMs have similar markings and capacity, they are probably identical. Diagnostic: A) The Serial EEPROM information in each pair of DIMMs should be identical or nearly identical. See Code 28, sub-code 0x1 for more resolution information. Table Continued 260 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0x4 (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Pairww DIMM xx (yyyy): di_module_rows is not 1 or 2! zzzz This error indicates the Cluster Memory DIMMs reported an odd (and unsupported) number of rows. Usually the number of rows reported by a DIMM corresponds to the number of sides of the DIMM which are populated by memory. One DIMM number of the failing pair will be logged in the Data field of the Fatal Error. See Code 28, sub-code 0x3 for resolution information. Fatal error: Code 28, subcode 0x5 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" No Cluster Memory Installed This error indicates that no memory was found in the Cluster memory riser. Since cluster memory is needed for proper node operation within the cluster, this is a condition which must be resolved for proper operation.You should look at prior output to determine if there were I2C errors. These errors suggest a problem with riser or DIMM installation. See Code 28, sub-code 0x1 for resolution information. Fatal error: Code 28, subcode 0x6 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Pairww DIMM xx (yyy): RAS cycle time > 10. got zzz/10 We This error indicates the Serial EEPROM on the DIMM reports a value which is outside tolerance for the memory controller. One DIMM number of the failing pair will be logged in the Data field of the Fatal Error. See Code 28, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 261 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0x7 (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Cluster Memory not responding. DIMM uuu (vvv): Expected = (xxxx) Actual = (yyyy) Addr (zzzz) *** Error: Cluster Memory FAILURE - too many mismatches. Before ECC initialization of Cluster memory (scrub), a small region must be tested and configured by the CPU to set up the ECC scrub of the remainder. If an error occurs during this test (such as memory read does not match the value just written), then this error will be reported. The DIMM number is logged in the Data field of the Fatal Error. Diagnostic: A) Compare the expected pattern such as a bit stuck high Example (bit 31 stuck low): Expected = (0xf1f1f1e5) Actual = Expected = (0x92929285) Actual = Expected = (0xb3b3b3a5) Actual = Expected = (0xd3d3d3c5) Actual = and actual values for a or stuck low. (0x71f1f1e5) (0x12929285) (0x33b3b3a5) (0x53d3d3c5) See Code 28, sub-code 0x1 for resolution information. Fatal error: Code 28, subcode 0x8 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Found errors during scrub. Eagle Error Status: xxxx *** Error: Found errors during scrub. Osprey Error Status: xxxx During the ECC initialization of Cluster memory, The Cluster Manager records and memory errors it encounters. If any were recorded, this error will be displayed. See Code 28, sub-code 0x1 for resolution information. Table Continued 262 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x9 (0) CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: CM DIMM programmed address > top of memory For each Cluster memory DIMM, there is a register in the Eagle / Osprey memory controller which specifies where the DIMM maps into CM physical memory. These mapping registers are configured during the Cluster memory probe and should not change under normal circumstances. Since this is an internal CM register, it is unlikely that reseating memory will correct this problem. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus.The base address in PCI space for the configuration and status registers (CSRs) is Window 0.Example: Whack> pci find 1590 Win Baseaddr Basesize Identity [0] 00:90200000 00:000004003PAR (ASIC) LPC# [1] 00:20000000 00:20000000 [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x90200000 above). This is the base address of the CM Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information as register programming. Table Continued Error codes—HPE 3PAR OS 3.2.2 263 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0xa (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz) *** Error: Uncorrectable ECC The Cluster memory controller detected an Uncorrectable ECC error. Eagle / Osprey identifies the failing bank and address with the error as well as the error syndrome. The BIOS will convert the information into the failing DIMM and Riser Slot numbers. There may be multiple Uncorrectable errors. In this case, the CM will save the address/syndrome for the most recent error. The DIMM number is logged in the Data field of the Fatal Error. Eagle nodes (S-Series and E-Series): There are 8 DIMMs maximum on the S-Series Cluster Memory Riser Card. If the DIMM number is not between 0-7 (inclusive), then the failing DIMM cannot be identified. Osprey nodes (T-Series and F-Series): There are 6 DIMMs on T-Series and 3 DIMMs on F-Series. The data field encodes which DIMM encodes the DIMM number in the lower 4 bits of the field and the channel number in the upper 4 bits. So a data value of 12 indicates DIMM 1.2 is at fault. Harrier nodes (V-Series, Atlas, Minime1 & 2): There are 8 DIMMs on V-Series between two different Harrier ASICs; two memory controllers with 2 DIMMs each. The data field encodes which memroy channel encountered the uncorrectable error. A data value of 10 means channel one ia at fault, a value of 0 means channel zero is at fault. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM(s). D) Replace the failing Cluster Memory DIMM(s). E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus. The base address in PCI space for the configuration and status registers (CSRs) is Window 0. Example: Whack> pci find 1590 ... Win Baseaddr Basesize Identity ... [0] 00:60200000 00:000004003PAR Eagle ... [1] 00:20000000 00:20000000 ... [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x60200000 above). This is the base address of the CM Memory Control Table Continued 264 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Register Block.Refer to the Scaffold System Architecture Reference for information as register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The CM Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Table Continued Error codes—HPE 3PAR OS 3.2.2 265 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0xb (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: CM DIMMxx (Jyyyy): Address (zz:zzzzzzzz) *** Error: Correctable ECC The Cluster memory controller detected a correctable ECC error. The CM identifies the failing bank and address with the error as well as the error syndrome. The BIOS will convert the information into the failing DIMM and Riser Slot numbers. The DIMM number is logged in the Data field of the Fatal Error. Eagle nodes (S-Series and E-Series): There are 8 DIMMs maximum on the Cluster Memory Riser Card. If the DIMM number is not between 0-7 (inclusive), then the failing DIMM cannot be identified. Osprey nodes (T-Series and F-Series): There are 6 DIMMs on T-Series and 3 DIMMs on F-Series. The data field encodes which DIMM encodes the DIMM number in the lower 4 bits of the field and the channel number in the upper 4 bits. So a data value of 12 indicates DIMM 2.1 is at fault. Harrier nodes (V-Series, Atlas, Minime1 & 2): This should not occur on Harrier. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM. D) Replace the failing Cluster Memory DIMM. E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus. The base address in PCI space for the configuration and status registers (CSRs) is Window 0.Example: Whack> pci find 1590 Win Baseaddr Basesize Identity [0] 00:60200000 00:000004003PAR Eagle [1] 00:20000000 00:20000000 [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x60200000 above). This is the base address of the CM Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information on register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The CM Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then Table Continued 266 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Table Continued Error codes—HPE 3PAR OS 3.2.2 267 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0xc (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" *** Error: Addr (zzzzzzzz) Wrote (wwwwwwww) Read (yyyyyyyy) or *** Error: Data Miscompare in Final Block offset zzzzzzzz *** Error: Expected (wwwwwwww) Actual (yyyyyyyy) or *** Error: CM DIMM5 (Jxxxx): Address (uu:uuuuuuuu) CM DECODE TEST miscompare at (1) (vvvvvvvvvvvvvvvv) Expected: (wwwwwwww) Actual: (yyyyyyyy) Offset: (zzzzzzzz) or similar to above The CBIOS runs Cluster Memory Tests as part of POST in both normal operation and manufacturing test. If any test fails due to a data miscompare, the test will generate this fatal error code with sub-code '0xc'. CBIOS runs the following tests: Walking 1/0 across data Walking 1/0 across address (512 MB Small Memory Window) Walking 1/0 using XCB (64 bytes) across segment boundaries Any test failure will result in a fatal error. The DIMM number is logged in the Data field of the Fatal Error. Eagle nodes (S-Series and E-Series): There are 8 DIMMs maximum on the Cluster Memory Riser Card. If the DIMM number is not between 0-7 (inclusive), then the failing DIMM cannot be identified. Osprey nodes (T-Series and F-Series): There are 6 DIMMs on T-Series and 3 DIMMs on F-Series. The data field encodes which DIMM encodes the DIMM number in the lower 4 bits of the field and the channel number in the upper 4 bits. So a data value of 12 indicates DIMM 2.1 is at fault. Harrier nodes (V-Series, Atlas, Minime1 & 2): This should not occur in Harrier. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM. D) Replace the failing Cluster Memory DIMM. E) Replace the node motherboard. Diagnostic: A) The memory controller registers are part of the CM register set which is mapped into CPU memory for access.Use the Whack "pci find 1590" command to find the CM on the PCI bus. The base address in PCI space for the configuration and status registers (CSRs) is Window 0.Example: Whack> pci find 1590 Win Baseaddr Basesize Identity [0] 00:60200000 00:000004003PAR Eagle [1] 00:20000000 00:20000000 Table Continued 268 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description [2] 02:00000000 02:00000000 Add offset 0xc0 to that address (0x60200000 above). This is the base address of the CM Memory Control Register Block.Refer to the Scaffold System Architecture Reference for information on register programming. Window 1 is the small cluster memory offset. If the error address is in the first 512 MB of Cluster memory, use whack to read/write this location and confirm the error. The CM Central Error register must be reset prior to error reproduction. If the error address is greater than 512 MB, then XCBs may be used to reproduce the error. Type "xcb help" to get more information on using XCBs. Fatal error: Code 28, subcode 0xd (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Pairwwww DIMMxxxx: Illegal SPD value <name of value> <value> This error indicates that a Cluster Memory DIMM was detected but that the Serial EEPROM present on the DIMM reported an illegal or unsupported value for our memory controller. The DIMM number is logged in the Data field of the Fatal Error. Example: Density (SPD byte 31) has more than 1 bit set (ie. 0x30) which indicates a non-standard part. See Code 28, sub-code 0x1 for resolution information. Most likely, the DIMM is not qualified for use in our Node Board. Fatal error: Code 28, subcode 0xe (mm) CM_MEMORY_FAILURE "Cluster Memory Failure" If there was a problem mapping the CM Small Cluster memory window into CPU 32-bit space, this error may result when attempting to initialize Cluster memory. The initialization problem could be due either to hardware failure or by setting a special NVRAM variable that eliminates the address space normally reserved for CM memory windows. An example of such is setting "mem_max" to a value above 2496. Another example would be setting "pci_base" above 0xa0000000. Resolution: Contact 3PAR technical support. Table Continued Error codes—HPE 3PAR OS 3.2.2 269 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0xf (mm) Description CM_MEMORY_FAILURE "Cluster Memory Failure" *** Error: Bank (xx) CM DIMMyy (Jzzzz) *** Error: CM DIMMs with ECC errors The Cluster memory controller detected a memory error in a specific DIMM bank. The CM memory error status register is logged in the Data field of the Fatal Error. See Code 28, sub-code 0xb for resolution information. Fatal error: Code 28, subcode 0x10 (mm) CM_MEMORY_FAILURE H1 LPC0 HW ERR ST H1 LPC0 ERR Stat H1 LPC0 ERR ID "CMA Failure" [00000004]: dataq_parity [00000006]: EP-Error-Rpt Fatal-Error [80000000]: HW-Err The Cluster memory controller detected a hardware error. This error is printed, as shown above. mm is decoded as bits 31-28 represent the LPC number and bits 27-0 are the error bits as set in the hardware error status register.The hardware error means that the Harrier ASIC is non functional. Resolution: A) Cycle power on the node. B) Replace the node. Fatal error: Code 28, subcode 0x20 (mm) CM_MEMORY_FAILURE "Cluster Memory Failure" Testing CM data lines with walking 1 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 bits test verifies that the processor may directly access CM cluster memory by performing a walking 1's test on all data lines. If any fails, this error will result. The data value (mm) could be in the form 0x00XXYYZZ there XX is the DIMM number (0-11), YY is the return code (RC_??), and the ZZ valeu is the number of errors found. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat Cluster Memory DIMMs. D) Replace the node motherboard. Diagnostic: A) Use the Whack command line to attempt to access CM memory manually to determine if data line bits are stuck. Table Continued 270 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0x21 (mm) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM data lines with walking 0 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all data lines. If any fails, this error will result. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x22 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" ZERO CM problem at addr xxxx Between PCI bus tests, a small portion of cluster memory is cleared. If errors in clearing the memory are detected, this error will result. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x23 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM address lines with walking 1 (first 512 MB only) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 address bits test verifies that the processor may directly access cluster memory by performing a walking 1's test on all address lines.If any fails, this error will result. See Code 28, sub-code 0x20 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 271 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x24 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM address lines with walking 0 (first 512 MB only) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 address bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all address lines. If any fails, this error will result. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x25 (mm) CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM segment decode boundaries This test verifies that memory decoding at all CM DIMM pairs is working correctly.It does so by writing a unique 128 bytes at each memory decode boundary location. It then verifies the values were written correctly and looks for corruption of other addresses. See Code 28, sub-code 0x20 for resolution information. Table Continued 272 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0x26 (eecd) Description CM_MEMORY_FAILURE (<DIMM>) "Cluster Memory Failure" Testing CM with random XOR (all Cluster Memory) ee = number of errors in XOR errors. c = Channel Number where the error took place. d = DIMM number where the error took place. HW error during or HW error during or *** Error: Data Expected (yyyy) XCB transfer CM -> CM XCB transfer CM -> PCI 1 (xxxx) Miscompare in Final Block offset xxxx Actual (zzzz) This function performs a random data test on all cluster memory attached to the CM to verify memory under stress with random patterns. This test also exercises the CM XOR engine as several sources are used simultaneously throughout the cluster memory test. See Code 28, sub-code 0x20 for resolution information. Fatal error: Code 28, subcode 0x27 (0) CM_MEMORY_FAILURE (<DIMM>)"DQS Training Failed" This error occurs when the DQS training fails to find working values for the DQS enable, DQS out skew, and DQS in skew. See Code 28, sub-code 0x20 for resolution information. *** Fatal error: Code 28, sub-code 0x30 (mm). CM_MEMORY_FAILURE "Cluster Memory Failure" Testing CM ECC lines with walking 1 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 bits test verifies that the processor may directly access CM cluster memory by performing a walking 1's test on all ECC lines.If any fails, this error will result. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat Cluster Memory DIMMs. D) Replace the node motherboard. Diagnostic: A) Use the Whack command line to attempt to access CM memory manually to determine if data line bits are stuck. Table Continued Error codes—HPE 3PAR OS 3.2.2 273 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0x31 (mm) Description CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM ECC lines with walking 0 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all ECC lines.If any fails, this error will result. See Code 28, sub-code 0x30 for resolution information. Fatal error: Code 28, subcode 0x32 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM Op Codes The CM Op Code test verifies that the processor may execute one of the available opperations for this cluster manager ASIC. This error means that a particular opcode is not supported. If any op code fails, this error will result. Resolution: A) Replace the node motherboard. Fatal error: Code 28, subcode 0x33 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM Source Interrupts The CM Source generated for companion CMA systems with only one Interrupts test will test that an interrupt is each CMA data path, from processor, CMA, or to either processor memory to local CMA.On CMA, the companion tests are not done. Resolution: A) Replace the node motherboard. Fatal error: Code 28, subcode 0x34 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM I2C communication test The CM I2C comminucation test will read and write to various safe CMA registers or CMA memory and verify that the expected values are read. A fail means either a bad DIMM or bad CMA. See Code 28, sub-code 0x30 for resolution information. Fatal error: Code 28, subcode 0x35 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Stopped on an Uncorrectable Error The scan for errors found an uncorrectable error in one of the CMAs. The system stopped during a BIOS test when this error was discovered. See Code 28, sub-code 0x30 for resolution information. Table Continued 274 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x36 (data) CM_MEMORY_FAILURE"Cluster Memory Failure" Stopped on a Correctable Error The scan for errors found a correctable error in one of the CMAs. The system stopped during a BIOS test when this error was discovered. See Code 28, sub-code 0x30 for resolution information. Fatal error: Code 28, subcode 0x40 (mm) CM_MEMORY_FAILURE "Cluster Memory Failure" Testing CM MMW data lines with walking 1 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 bits test verifies that the processor may directly access CM cluster memory by performing a walking 1's test on all data lines. This test uses the Medium Memory Window (MMW). If any fails, this error will result. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat Cluster Memory DIMMs. D) Replace the node motherboard. Diagnostic: A) Use the Whack command line to attempt to access CM memory manually to determine if data line bits are stuck. Fatal error: Code 28, subcode 0x41 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM MMW data lines with walking 0 Addr (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 bits test verifies that the processor may directly access cluster memory by performing a walking 0's test on all data lines. This test uses the Medium Memory Window (MMW). If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Fatal error: Code 28, subcode 0x42 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" ZERO CM problem at addr xxxx Between PCI bus MMW tests, a small portion of cluster memory is cleared. If errors in clearing the memory are detected, this error will result. See Code 28, sub-code 0x40 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 275 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x43 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 1 (MMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 address bits test verifies that the processor may directly access cluster memory by performing a walking 1's test on all address lines using the medium memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Fatal error: Code 28, subcode 0x44 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 0 (MMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 address bits test verifies that the processor may directly access cluster memory by performing a walking 0's test test on all address lines using the medium memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Table Continued 276 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 28, subcode 0x45 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 1 (RMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 1 address bits test verifies that the processor may directly access cluster memory by performing a walking 1's test on all address lines using the remote memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Fatal error: Code 28, subcode 0x46 (mm) CM_MEMORY_FAILURE"Cluster Memory Failure" Testing CM address lines with walking 0 (RMW) *** Error: Failed to write: Address (xxxx) Data (yyyy) or *** Error: Short to Ground - Data same as at Addr 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: Short to Address - Data should be 0 *** Error: Addr (xxxx) Read(yyyy) or *** Error: could not write data to this address *** Error: Write At (xxxx) Wrote(yyyy) Read(zzzz) The CM walking 0 address bits test verifies that the processor may directly access cluster memory by performing a walking 0's test test on all address lines using the remote memory window. If any fails, this error will result. See Code 28, sub-code 0x40 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 277 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 28, subcode 0x47 (wwxxyyzz) Description CM_MEMORY_FAILURE"Cluster Memory Failure" *** Error: MTB Granularity error! SPD Byte 10, expected: ww, actual: yy SPD Byte 11, expected: xx, actual: zz All the MTB (Medium TimeBase) calculations in the software leveling code are based on an MTB granularity of 0.125ns (SPD Byte 10=0x01 and Byte 11=0x08). These bytes define a value in nanoseconds that represents the fundamental timebase for medium grain timing calculations. This value is typically the greatest common divisor for the range of clock frequencies (clock periods) supported by a particular SDRAM. This value is used as a multiplier for formulating subsequent timing parameters. The medium timebase (MTB) is defined as the medium timebase dividend (byte 10) divided by the medium timebase divisor (byte 11). Resolution: A) Replace CM DIMM. Fatal error: Code 29, subcode 0x0 (data) CM_LINK_FAILURE "Cluster Link Failure" Link 0 did not come up (0xac000000) error = (0x002022ff) (data = link number) CM Links are high speed connections between all of the node boards in a cluster via the center panel. During Manufacturing test, nodes are connected to a special Manufacturing Center panel that connects the link transmitter to its own receivers (external loopback). When the node senses that it is in this special Center Panel, it will initialize all of the links and perform loopback tests. If any link fails to initialize, this sub-code will be reported. Resolution: A) Cycle power on the node. B) Verify that the node is securely mated with the Center Panel. C) Turn off power, re-seat the node into the center panel, and turn power back on. D) Replace the node motherboard. Diagnostic: A) Use the Whack "eagle link" commands to run more diagnostic tests on the links. The CM requires both the PCI scan has completed and Cluster Memory present and initialized. Table Continued 278 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 29, subcode 0x1 (data) CM_LINK_FAILURE "Cluster Link Failure" CM Link Initialization failed (data = LLRR) where LL is the link bit pattern. 01 is link 0, 02 is link 1, 04 is link 2, and 08 is link 3. RR is the failure reason. E4 is Hardware error, F0 is user abort. CM Links are high speed connections between all of the node boards via the center panel. During Manufacturing test, nodes are connected to a special Manufacturing Center panel that connects each link's transmitter to its own receiver (external loopback).When the node senses that it is in this special Center Panel, it will initialize the links and run a special test to verify the operation of the transmitter/receivers of each link. If any link fails, the test will report this sub-code. See Code 29, sub-code 0x0 for resolution information. Fatal error: Code 29, subcode 0x2 (data) CM_LINK_FAILURE "Cluster Link Failure" CM# LinkXOR test: Link [0]..[FAIL] (1) (data = the link bit pattern. bit 0 is link 0, bit 1 is link 1, bit 2 is link 2, and bit 3 is link 3. CM Links are high speed connections between all of the node boards via the center panel. During Manufacturing test, nodes are connected to a special Manufacturing Center panel that connects each link's transmitter to its own receiver (external loopback).When the node senses that it is in this special Center Panel, it will initialize the links and run a special test to verify the operation of the transmitter/receivers of each link. If any link fails, the test will report this sub-code. See Code 29, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 279 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 29, subcode 0x3 (data) CM_LINK_FAILURE "Cluster Link Failure" CM# Link INT??? test: Link [0]..[FAIL] (1) (data = the link bit pattern. bit 0 is link 0, bit 1 is link 1, bit 2 is link 2, and bit 3 is link 3. The CM Link INT test verifies that setting either of the two interrupt flags (DEST, SRC) in the XCB does actually generate and interrupt to the processor. See Code 29, sub-code 0x0 for resolution information. Fatal error: Code 29, subcode 0x4 (data) CM_LINK_FAILURE "Cluster Link Failure" *** Error RTT Link 1 XCB ASync failed (Send) (data = link number) The CM Link Round Trip Test failed due to an XCB failure. CM XCB failed during link DMA.Use the "eagle status" command for more information on the type of error.This test checks the CM link status at multiple times during the test. The "(Send)" part of the message indicates which stage failed. Another possible values is "(Receive)". See Code 29, sub-code 0x0 for resolution information. Fatal error: Code 29, subcode 0x5 (data) CM_LINK_FAILURE "Cluster Link Failure" *** Error RTT (Receive) Link 1 Length = 0 or *** Error RTT Offset = xxxxx Expected = yyyyy Returned = zzzzz or *** Error RTT (Return) Link 1 Length mismatch or *** Error RTT (Return) Link 1 Timestamp mismatch. (data = link number) The CM Link Round Trip Test failed due to data miscompare. All packets have a length check and timestamp check. Payload compare is optional. Use the "eagle status" command to check for Uncorrectable ECC errors. See Code 29, sub-code 0x0 for resolution information. Table Continued 280 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 29, subcode 0x6 (data) CM_LINK_FAILURE "Cluster Link Failure" *** Error RTT (Return) Timeout waiting for packet Link 1 (data = link number) The CM Link Round A packet was sent period. The Round a remote node.Use Uncorrectable ECC Trip Test failed due to packet timeout. and not received in a reasonable timeout Trip Test may not have been started on the "eagle status" to check for errors. Resolution: A) Start CM Link Round Trip Test on remote node. B) Cycle power on the node. C) Verify that the node is securely mated with the Center Panel. D) Turn off power, re-seat the node into the center panel, and turn power back on. E) Replace the node motherboard. Diagnostic: A) Use the Whack "eagle link" commands to run more diagnostic tests on the links. The CM requires both the PCI scan has completed and Cluster Memory present and initialized. Fatal error: Code 29, subcode 0x10 (0) CM_LINK_FAILURE "Cluster Link Failure" REC_EN went low. Test failed for link [x](yyyyyyyy) The "cma link init" command is used to initialize and bring up the CM links to nodes which indicate a "Power Ok" state. If this error occurs, it is possible the remote node was transmitting BIST, but then later stopped (such as from a reset or power off). Resolution: A) Perform the same test again. B) Replace the node motherboard. Diagnostic: A) Verify CM link may be brought up manually using the "eagle link set" command. Table Continued Error codes—HPE 3PAR OS 3.2.2 281 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 29, subcode 0x11 (0) CM_LINK_FAILURE "Cluster Link Failure" *** Error CM linkxx producer / consumer mismatch The CM has XCB engines which transfer data. Software manages the producer register and the CM hardware follows with the consumer register. If these two do not agree and CM should be idle, then it's possible the CM has halted due to failure of some operation. This problem is likely caused by a cluster memory or link failure. Resolution: A) B) Replace the C) Replace the Diagnostic: A) Fatal error: Code 30, subcode 0x0 (0) Cycle power on the node. node motherboard. link partner node. Replace Eagle/Osprey/Harrier ASIC. SERIAL_PORT_FAILURE "Serial Port Failure" *** Error: No Oxford serial chip xx found or *** Error: No Exar serial chip found The Exar and Oxford serial chips are used for a secondary low speed link which directly connects all nodes in the cluster. They are primarily in the event of a link failure to verify whether another node in the cluster has actually gone down.Since the part is integrated onto the motherboard and is on a PCI bus, a failure to locate the internal serial chips may indicate other PCI problems as well. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "pci probe" command to show all devices on the PCI bus.Look for the two Oxford device entries, or a single Exar device entry (Pentium 4 node).If they are not there, verify other board level components are present in the list in order to isolate the component failure on the board. B) Note that a failure of a single Oxford chip may be the cause of this behavior as one bridges to the PCI bus for both. Table Continued 282 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 30, subcode 0x1 (0) Description SERIAL_PORT_FAILURE "Serial Port Failure" *** Error: Serial Port Mfg Test failed Port (3) [FAIL] When the Node board is inserted into a Manufacturing Test Centerpanel, the internal Serial Port Manufacturing test will automatically run. This error indicates failures on all ports tested. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "pci probe" command to show all devices on the PCI bus.Look for the two Oxford device entries or a single Exar device entry (Pentium 4 node).If they are not there, verify other board level components are present in the list in order to isolate the component failure on the board. B) Note that a failure of a single Oxford chip may be the cause of this behavior as one bridges to the PCI bus for both. C) Whack provides internal serial Serial Port commands for further analysis. Fatal error: Code 30, subcode 0x2 (0) SERIAL_PORT_FAILURE "Serial Port Failure" Port (4):Processed 109 bytes[FAIL] All cluster internal serial ports go through a quick internal loopback test immediately after initialization to do a short test of proper operation. This test will run regardless of the type of centerplane in which the node is connected. This error indicates failures on all ports tested. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Use the Whack "pci probe" command to show all devices on the PCI bus.Look for the two Oxford device entries or a single Exar device entry (Pentium 4 node).If they are not there, verify other board level components are present in the list in order to isolate the component failure on the board. B) Note that a failure of a single Oxford chip may be the cause of this behavior as one bridges to the PCI bus for both. C) Whack provides internal serial Serial Port commands for further analysis. Table Continued Error codes—HPE 3PAR OS 3.2.2 283 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 30, subcode 0x3 (0) SERIAL_PORT_FAILURE "Serial Port Failure" Internal UART is not functioning properly. Most likely this is due to a hardware failure related to the SuperIO. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Non-fatal error: Code 31, sub-code 0x0 (0) GPIO_TEST_FAILURE "GPIO Failure" FAIL (high) Port (6) Bit (4) wrote 0(0x1) Port (7) Bit (4) read 1, expected 0(0x3) The Vitesse VSC055 2 Wire Backplane Controller chip controls interfaces to the Centerplane, LEDs, Power Supplies, Nickel battery, and PCI slots. It is connected to the I2C bus. In normal 2, 4, or 8 node centerplanes, the chip will get its ports initialized as inputs or outputs and start monitoring peripheral systems. No tests available. When connected to a Manufacturing Centerplane, it will have selected pins routed to other pins for loopback testing. See the Manufacturing Centerplane Specification for details. During this test, proper VSC operation will be confirmed. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Whack "i2c vsc" commands can be used to peek and poke the VSC055 chip when in a Manufacturing Centerplane. In normal Centerplanes, these pins will be connected to other components and should not be modified. Table Continued 284 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 31, sub-code 0x1 (0) GPIO_TEST_FAILURE "GPIO Failure" Failed I2C VSC055 1.ce.yy write zzzz During initialization, the VSC055 registers are programmed for proper system operation. This is done over the I2C bus. If an I2C operation fails during VSC055 initialization, this error will result. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Whack "i2c vsc" commands can be used to peek and poke the VSC055 chip. One failure seen in the past with the VSC055 is that sometimes a specific chip could not handle the first write access to the command register which causes a soft reset. It was determined the part violated the I2C protocol in ACKing the transaction before the I2C write operation completed. Fatal error: Code 31, subcode 0x2 (0) GPIO_TEST_FAILURE "GPIO Failure" FPGA Scratchpad registers failed meaning bad FPGA hardware. Resolution: A) Cycle power on the node. B) Replace the node. Non-fatal error: Code 31, sub-code 0x3 (0) GPIO_TEST_FAILURE "GPIO Failure" FPGA Interrupt Test failed. Resolution: A) Cycle power on the node. B) Replace the node. Non-fatal error: Code 31, sub-code 0x4 (0) GPIO_TEST_FAILURE "GPIO Failure" NEMOE Loopback Test failed. Resolution: A) Cycle power on the node. B) Replace the node. Non-fatal error: Code 31, sub-code 0x5 (0) GPIO_TEST_FAILURE "GPIO Failure" During the "Board GPIO Test", the FPGA ID is not what it expects it to be. Resolution: A) Cycle power on the node. B) Replace the node. Table Continued Error codes—HPE 3PAR OS 3.2.2 285 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 31, sub-code 0x6 (0) Description GPIO_TEST_FAILURE "GPIO Failure" During the "Board GPIO Test", the FPGA Revision is not what it expects it to be. Resolution: A) Cycle power on the node. B) Replace the node. Table Continued 286 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 31, sub-code 0x7 (0) Description GPIO_TEST_FAILURE "GPIO Failure" Titan specific. During the "Manufacturing Centerpanel GPIO Test", one or more tests have failed depending upon the output. A)If failed during 'Testing Expanders (o/p) <--> FPGA (i/p) connections:' For example, FAIL (low) Port (76) Bit (1) wrote(0x00) Port (302) Bit (4) read 0xff, expected(0xef) 1) Program I2C expander by following command: Whack> cb i2c 9.76.3 0 Here "76" is reported port number. "3" is config register offset for the expander. "0" makes all expander bits as output. 2) Set the bit in I2C expander. Whack> cb i2c 9.76.1 2 Here "1" is rdwr register offset for the expander. "2" is reported bit 1 (1 << "1") in expander. 3) Read a byte from FPGA offset. Whack> db fpga 302 1 Here "302" is reported FPGA offset. Confirm if the bit "4" in read value is set. Repeat step 2) and 3) by writing 0 to I2C expander 9.76.1 and checking if the bit "4" in FPGA offset 0x302 is cleared. B)If failed during 'Testing FPGA (o/p) <--> Expanders (i/p) connections:' For example, FAIL (low) Port (305) Bit (4) wrote(0x00) Port (7e) Bit (7) read 0x86, expected(0x06) 1) Program I2C expander by following command: Whack> cb i2c 9.7e.3 ff Here "7e" is reported port number. "3" is config register offset for the expander. "ff" makes all expander bits as input. 2) Write a byte to FPGA offset. Whack> db fpga 305 10 Here "305" is reported FPGA offset. Writing 0x10 will set the bit "4" in that offset. 3) Read a byte from I2C expander. Whack> db i2c 9.7e.0 1 Here "7e" is reported port number. "0" is read register offset for the expander. Confirm if the bit "7" in read value is set. Repeat step 2) and 3) by writing 0 to the FPGA offset and checking if the bit "7" in I2C Expander 9.7e.0 is cleared. C)For all other failure cases refer to Section # 18.2 "Manufacturing Centerplane GPIO Test Diagnostics" of Table Continued Error codes—HPE 3PAR OS 3.2.2 287 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description CBIOS user guide at http://engweb/twiki/bin/view/Main/TitanMfgCpFpgaGpioTestDiag Fatal error: Code 32, subcode 0x1 (chunk) CM_XOR_FAILURE "CM XOR Failure" Xor Engine Status: P0_XERR Error Status : XOR_ERR PCI0 Error Status: PCI1 Error Status: The Eagle, Osprey, and Harrier ASICs contain a DMA engine capable of XOR operations.This DMA engine is commonly referred to as the XCB engine.The XCB engine can DMA data between 14 different modules within the ASIC, each module capable of sinking or sourcing data. The XCB engine will stop all DMA if it encounters an error while transferring data.The XCB error status indicates the module that produced the error. Further details of the error can be gathered by inspecting the error registers of that module. Use the whack command "cma status all" to get further diagnostic information. If the user continues past this error, software will attempt to reset the error and continue. Sub-code 0x1 is specific to Osprey and indicates an uncorrectable ECC error following an attempt to zero all of cluster memory. The "chunk" value indicates the chunk where the ECC error occurred. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Whack "cma status all" command displays the status registers for each CM module. Refer to the module that produced the error for further information and diagnostic procedure. Fatal error: Code 32, subcode 0x2 (chunk) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Osprey and indicates an uncorrectable ECC error following an attempt to ECC scrub all of cluster memory. The "chunk" value indicates the chunk where the scrub error occurred. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Table Continued 288 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 32, subcode 0x6 (chunk) Description CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates an uncorrectable ECC error following an attempt to zero all of cluster memory. The "chunk" value indicates the chunk where the ECC error occurred. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Fatal error: Code 32, subcode 0x7 (err_last) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates a general Harrier DMA error following an attempt to zero all of cluster memory. The "err_last" value represents the normalized content of the Harrier mem_common->mem_err_status register. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Fatal error: Code 32, subcode 0x8 (chunk) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates an uncorrectable ECC error following an attempt to ECC scrub all of cluster memory. The "chunk" value indicates the chunk where the scrub error occurred. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Fatal error: Code 32, subcode 0x9 (err_last) CM_XOR_FAILURE "CM XOR Failure" This sub-code is specific to Harrier and indicates a general Harrier DMA error following an attempt to ECC scrub all of cluster memory. The "err_last" value represents the normalized content of the Harrier mem_common->mem_err_status register. See Code 32, sub-code 0x1 for Resolution and Diagnostic information. Table Continued Error codes—HPE 3PAR OS 3.2.2 289 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 33, subcode 0x0 (0) SDRAM_I2C_BAD_READ "Memory I2C Bad Read" *** Error: Unable to read from SDRAM at I2C ww.xx.yy.zz This error indicates that an SDRAM DIMM for which information was requested is no longer available. This may be due to an intermittent I2C bus, or a hardware failure. Resolution: A) Cycle power on the node. B) Replace the failing DIMM's pair. C) Replace the node motherboard. Fatal error: Code 34, subcode 0x1 (0xff) PCI_BUS_ERROR "PCI Bus Failure" This error indicates an uncorrectable error occurred on the PCI bus. In the future, the data field may indicate the PCI slot number for the device which failed. In order to determine the cause of this error, it may be useful to review either console messages or the IDE disk log. Typical messages preceding this error are likely difficult to read, but may indicate the exact cause. Example: --- SMI: smm_inb(0x3a) == 0x86 GPE 9 triggered Error in PCI device 02.02.00 (PCI/PCI Bridge #0 (controls slot 1)): PCI status register (0x06) [62b0]: Signaled system error (SERR#), Received master abort Secondary PCI status register (0x1e) [0aa0]: Signaled target abort Bridge P_SERR (0x6a) [80]: Delayed transaction master initiator timeout Error in PCI device 03.01.00 (PCI Slot 1): PCI status register (0x06) [1290]: Received target abort Secondary PCI status register (0x1e) [0a80]: Signaled target abort Error in PCI device 04.06.00 (inside PCI Slot 1): PCI status register (0x06) [1230]: Received target abort Error in PCI device 04.06.01 (inside PCI Slot 1): PCI status register (0x06) [1230]: Received target abort (PCI errors not cleared) Table Continued 290 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 34, subcode 0x1 (ff) Description In the above case, a card in PCI Slot 1 was transferring data up to a device, likely the cluster manager, when it didn't get a response. The bridge above the card received a master abort, which it then relayed to its secondary side as signaled target abort.The bridge on the card in PCI Slot 1 then received the target abort and signaled a target abort on its secondary side. Both PCI devices then indicated they received target aborts. Resolution: A) Cycle power on the node. B) Reseat all PCI cards. C) Replace the suspected PCI card. D) Remove PCI cards one at a time. E) Replace the node motherboard. Fatal error: Code 35, subcode 0x0 (data) SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable" One or both DIMMs in a DIMM pair has failed. Bits 4-7 of the data value indicate the DIMM pair. If data is0, then DIMM pair 0 has failed. if data is 10, then DIMM pair 1 has failed. Example: --- SMI: TEMPCAUT (SMALERT): 0x01 (bits reset) Uncorrectable ECC error 0x9279a103 recorded in reg 0x98 Pair1, either DIMM1 or DIMM3 contains the error Error in locations [0x382cd818 .. 0x382cd81f] Uncorrectable ECC error 0x9279a101 recorded in reg 0x94 Syndrome/bit number information might not be accurate, as more than 1 error happened Pair1, either DIMM1 or DIMM3 contains the error Error in locations [0x382cd808 .. 0x382cd80f] (Clearing cache line at 0x382cd800) (Clearing cache line at 0x382cd800) ESR == 0x0003 (expected low bit == 0) Fatal error: Code 35, subcode 0x0 (10) Resolution: A) Cycle power on the node. B) Clear dust and debris from the node. C) Remove and reseat the specified CPU DIMM pair. D) Replace the failed CPU DIMM pair. E) Replace the node motherboard. Diagnostic: A) Verify North Bridge heatsink attachment. B) Check DIMM clock buffers (X6200 on P4-Eagle). C) Check DIMM termination (R5836, etc on P4-Eagle nodes). Table Continued Error codes—HPE 3PAR OS 3.2.2 291 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 35, subcode 0x1 (data) Description SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable" A single DIMM of a DIMM pair has failed. THe data value indicates which DIMM. Bits 4-7 of the data value indicate which DIMM pair. Bits 0-3 of the data value indicate which DIMM within that pair. If data is0, then DIMM 0 of pair 0 has failed. If data is1, then DIMM 1 of pair 0 has failed. if data is 10, then DIMM 0 of pair 1 has failed. if data is 11, then DIMM 1 of pair 1 has failed. Resolution: A) Cycle power on the node. B) Clear dust and debris from the node. C) Remove and reseat the specified CPU DIMM. D) Replace the failed CPU DIMM. E) Replace the node motherboard. Fatal error: Code 35, subcode 0x2 (data) SDRAM_UC_ECC_ERROR "Control Cache ECC Uncorrectable" This code means an ECC error was detected, but the BIOS did not completely decode the error. See Code 35, sub-code 0x0 for resolution information. Fatal error: Code 36, subcode 0x0 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: SMI: SERR# input went low In the event of a hardware failure, it is normal to trigger a processor System Management Interrupt (SMI).If the SMI gets cleared before the BIOS has a chance to observe it (which should not happen), then this error will result. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Fatal error: Code 36, subcode 0x1 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: SMI: Write made to ACPI PM register In normal operation the operating system should not write to the ACPI PM register. If the BIOS detects a write took place, it will flag this as an error caused by a failing operating system or other node hardware. Resolution: A) Cycle power on the node. B) Reinstall the operating system. C) Replace the node motherboard. Table Continued 292 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 36, subcode 0x2 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: SMI not fully handled. The BIOS was not able to determine the actual cause of the triggered SMI. Resolution: A) Cycle power on the node. B) Reinstall the operating system. C) Replace the node motherboard. Fatal error: Code 36, subcode 0x3 (0) FATAL_SMI_ERROR "Fatal SMI Error" --- SMI: No known cause (# 4097) GPE status: 0x400000, GPE input: 0x0xfff7ff *** Error: SMI: No known cause is too frequent This error may result if there is an unknown hardware device triggering SMIs in the system and those SMIs are happening too frequently. Most likely the device continues to trigger an SMI because its problem has not been serviced, and no real work is possible at this point because immediately after returning from the SMI, another is triggered. The BIOS attempts to recognize this condition and stop with a fatal error rather than just continuing to display errors. Resolution: A) Remote reset or cycle power on the node. B) Reinstall the operating system. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 293 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 36, subcode 0x4 (0) Description FATAL_SMI_ERROR "Fatal SMI Error" *** Warning: SMI cause is too frequent; disabling SMI handling *** Error: SMI cause could not be masked This error may result if a known SMI cause is happening too frequently. In a normally functioning node, SMIs should occur infrequently, as there is a performance impact associated with handling each SMI.The BIOS will first attempt to disable known SMIs in order to mask this problem. If that is insufficient, the BIOS will stop with this fatal error. Resolution: A) Check for CPU memory DIMM correctables in the event log. Replace DIMMs if they are suspect. B) Check for hardware oscillating events in the event log (such as PS status). On some node types, board GPIO changes are reported through SMI. You may need to replace power supplies or another FRU. C) Replace the node motherboard. Diagnostic: A) Set "fatal_no_reboot" at Whack and then enter Whack at the Fatal Error.You should be able to inspect the state of the machine prior to SMI handling to see what status is asserted. Output from the following Whack commands may be helpful: 1) eagle status 2) vsc status 3) pci status 4) mem bridge Fatal error: Code 36, subcode 0x5 (0) FATAL_SMI_ERROR "Fatal SMI Error" *** Error: In SMI on CPU ww [xx], CR2 was 0xyyyy, but got changed to 0xzzzz This error will result if the BIOS inadvertently changes the contents of CR2 while processing a SMI.This should not happen in normal operation, but might happen as the result of a `whack' command. As returning from this SMI could easily cause corruption of the OS or of a user-level program, this fatal error is flagged instead. Resolution: A) Cycle power on the node. Table Continued 294 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode zz (0) Description GEVENT_TRIGGERED"GEvent Triggered" Code 37 sub-codes are a bitmask of error values. This means you may find an error which will simultaneously trigger multiple GEVENTs. This event is probably one of the hardest to interpret as it often will indicate multiple board devices have detected a fatal error condition.In general, it's much more convenient to look up the decoded error in the BIOS output of the idelog rather than manually decoding this event back to indicators. Resolution: Look up each individual documented sub-code below which when OR'd together form the sub-code observed. Fatal error: Code 37, subcode 0x1 (0) GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x01 CMIC_FATAL (GEVENT0) This error indicates the CMIC (North Bridge) had a fatal error. T-Series and F-Series (5000P) nodes: *** Error: GPE[0]: PCI2_PERR_L This error indicates either the PLX #2 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #2 detected a parity error. These components manage PCI slots 0, and 1 on T-Series and Slot 0 on F-Series. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[0]: PEX2_FATAL_ERROR This error indicates that the PLX #2 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 0, 1, and 2; Harrier 1 and 2 LPC0. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove any recently installed PCI cards. D) Remove all PCI cards. E) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 295 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x2 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x02 ALERT (GEVENT1) Error in PCI device 00.00.00 (CMIC-LE Memory Controller/ Thin IMB): ESR (0x4c) [0004]: IMBus error (PCI errors not cleared) The output above can be considered "typical" but really may contain any of the possible CMIC (North Bridge) Memory Controller or other PCI bus errors. An IMBus error indicates a communication problem between the North Bridge and one of the South Bridge or CIOBX2. This would likely indicate a node motherboard failure. It has been observed in the field that a flaky or bad PCI socket may also cause this. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove any recently installed PCI cards. D) Remove all PCI cards. E) Replace the node motherboard. T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[1]: MCH Fatal Error This error indicates the MCH (North Bridge) has detected a fatal condition.Most likely there are other error messages present in the idelog to help pinpoint the issue. Since the MCH is the top of the root complex, it's very common to see the MCH indicating Fatal error on nearly all failures. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs if no other error is indicated. C) Replace the node motherboard. Table Continued 296 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x4 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: --- SMI: smm_inb(0x39) == 0x04 GPE 2 triggered THERMT_L0_OSB (GEVENT2) This indicates a thermal event triggered a GPIO interrupt. It is a fatal condition on Pentium III nodes, and the node will be immediately taken out of the cluster with this fatal error. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x04 GPE 2 triggered P0_PROC_HOT (GEVENT2) The Pentium 4 CPU supports clock modulation which reduces the core frequency when the core temperature is too high. The BIOS enables this support when starting the OS, so after the node has joined the cluster, the BIOS will asynchronously notify the OS if this event occurs but not take it out of the cluster. At the same time, the Pentium 4 processor will automatically reduce its clock speed so as to generate less heat and not reach a shutdown temperature. This message is therefore not fatal on P4 CPUs. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. T-Series and F-Series (5000P) nodes: *** Error: GPE[2]: PCI0_PERR_L This error indicates either the PLX #0 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #0 detected a parity error. These components manage PCI slots 4, and 5 on T-Series and Slot 2 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[2]: PEX0_FATAL_ERROR Table Continued Error codes—HPE 3PAR OS 3.2.2 297 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description This error indicates that the PLX #0 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 6, 7, and 8; Harriers 1 and 2 LPC2. See Code 37, sub-code 0x1 for resolution information. Chimera nodes: *** Error: GPE[2]: PEX0_FATAL_ERROR This error indicates that the PLX 8796 #0 or #1 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 0, 1, 5 and 6; Harrier 0, LPC0 and LPC2; and Harrier 1, LPC0 and LPC2. See Code 37, sub-code 0x1 for resolution information. Eos and Tornado nodes: *** Error: GPE[2]: PEX_FATAL_ERROR This error indicates that the PLX PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 0, 1, and 2; Harrier LPC0 and LPC2. See Code 37, sub-code 0x1 for resolution information. Table Continued 298 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x8 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: --- SMI: smm_inb(0x39) == 0x08 GPE 3 triggered THERMT_L1_OSB (GEVENT2) This indicates a thermal event triggered a GPIO interrupt. See Code 37, sub-code 0x2 for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x08 GPE 3 triggered P1_PROC_HOT (GEVENT2) This indicates a thermal event triggered a GPIO interrupt. See Code 37, sub-code 0x2 for resolution information. T-Series and F-Series (5000P) nodes: *** Error: GPE[3]: PCI0_SERR_L This error indicates either the PLX #0 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #0 detected a fatal error (SERR). These components manage PCI slots 4, and 5 on T-Series and Slot 2 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[3]: PEX1_FATAL_ERROR This error indicates that the PLX #1 PCIe-PCIe bridge detected a fatal error.These components manage PCI slots 3, 4, and 5; Harrier 1 and 2 LPC1. See Code 37, sub-code 0x1 for resolution information. Chimera nodes: *** Error: GPE[3]: PEX1_FATAL_ERROR This error indicates that the PLX 8750 PCIe-PCIe bridge detected a fatal error. This component manages PCI slots 2, 3, and 4; Harrier 0, LPC1; and Harrier 1 LPC1. See Code 37, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 299 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x10 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: GPE 4 triggered MIRQ (GEVENT4) This error indicates the memory controller (CNB20HE) triggered an interrupt. The CNB20HE documentation lists possible sources as correctable ECC error on Memory data bus and Processor data bus. See below (P4) for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x10 GPE 4 triggered P0_IERR (GEVENT4) This error indicates that P4 CPU 0 has asserted IERR#, which is used to indicate a processor internal error event occurred. The Intel documentation indicates one cause of this error is a machine check exception when exceptions have not yet been enabled. From our experience in the field, the problem is possibly a CPU or node motherboard failure. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove any recently installed PCI cards. D) Remove all PCI cards. E) Replace the node motherboard. Diagnostic: A) Replace CPUs. B) Replace CPU VRMs. C) Check DIMM termination (R5836 etc on P4-Eagle nodes). T-Series and F-Series (5000P) nodes: *** Error: GPE[4]: PCI1_PERR_L This error indicates either the PLX #1 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #1 detected a parity error. These components manage PCI slots 2, and 3 on T-Series and Slot 1 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[4]: FPGA_LPC_IRQ0_L This error indicates an internal error. This should not occur in a V-Series system. See Code 37, sub-code 0x1 for resolution information. Table Continued 300 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x20 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x20 GPE 5 triggered P1_IERR (GEVENT5) This error indicates that P4 CPU 1 has asserted IERR#. See Code 37, sub-code 0x10 (P4) for resolution information. T-Series and F-Series (5000P) nodes: *** Error: GPE[5]: PCI1_SERR_L This error indicates either the PLX #1 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #1 detected a fatal error (SERR). These components manage PCI slots 2, and 3 on T-Series and Slot 1 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P), Eos, Tornado and Chimera nodes: *** Error: GPE[4]: FPGA_LPC_IRQ1_L This error indicates that NEMOE raised the FPGA SMI interrupt and it was not handled properly. See Code 37, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 301 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x40 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x40 GPE 6 triggered P_SERR (GEVENT6) This error indicates one or more of the system's chipset is asserting P_SERR (primary side system error). Output is usually followed by outstanding PCI errors as indicated by chipset devices. Resolution: A) Identify and replace failing PCI card based on error output. It may be necessary to contact hardware engineering with BIOS output to determine which PCI slot is at fault. B) Remove all PCI cards. C) Replace the node motherboard. T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[6]: MCH Uncorrectable Error This error indicates the MCH (North Bridge) has detected an uncorrectable error. Most likely there are other error messages present in the idelog to help pinpoint the issue. Since the MCH is the top of the root complex, it's very common to see the MCH indicating Uncorrectable error on nearly all failures. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs if no other error is indicated. C) Replace the node motherboard. Table Continued 302 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x80 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x39) == 0x80 GPE 7 triggered P_PERR (GEVENT7) This error indicates one or more of the system's chipset is asserting P_PERR (primary side parity error). See Code 37, sub-code 0x40 for resolution information. T-Series and F-Series (5000P) nodes: *** Error: GPE[7]: PCI2_SERR_L This error indicates either the PLX #2 PCIe-PCIX bridge or the Intel 31154 PCIX-PCIX brige #2 detected a fatal error (SERR). These components manage PCI slots 0, and 1 on T-Series and Slot 0 on F-Series. See Code 37, sub-code 0x1 for resolution information. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[7]: Not connected This error indicates an internal error. This should not occur in a V-Series system. See Code 37, sub-code 0x1 for resolution information. Eos, Tornado, and Chimera nodes: *** Error: GPE[7]: MCH Fatal Error This error indicates the MCH (North Bridge) has detected a fatal condition.Most likely there are other error messages present in the idelog to help pinpoint the issue. Since the MCH is the top of the root complex, it's very common to see the MCH indicating Fatal error on nearly all failures. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs if no other error is indicated. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 303 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x100 (0) Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: --- SMI: smm_inb(0x3a) == 0x01 GPE 8 triggered CPU_TEMP_INTR (GEVENT8) This indicates a CPU temperature event triggered a GPIO interrupt. See Code 37, sub-code 0x2 for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x3a) == 0x01 GPE 8 triggered S_SERR (GEVENT8) This error indicates one or more of the system's chipset is asserting S_SERR (secondary side system error). See Code 37, sub-code 0x40 for resolution information. T-Series and F-Series (5000P) nodes: --- SMI request via EXT_SMI This error indicates another node in the cluster has forced this node to handle an SMI. Most likely the other node is attempting to force a panic dump because the local node has stopped responding. Resolution: A) Inspect the core dump to determine if the cause was a software or hardware failure. B) Replace the node motherboard if the issue recurs and can not be identified as a software failure. V-Series, Atlas, Minime (5000P) nodes: *** Error: GPE[7]: Not connected This error indicates an internal error. This should not occur in a V-Series system. See Code 37, sub-code 0x1 for resolution information. Table Continued 304 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x200 Description GEVENT_TRIGGERED"GEvent Triggered" S-Series (PIII) nodes: This error indicates one or more of the system's chipset is asserting SERR (system error).Output is followed by the PCI scan results, which displays outstanding PCI errors of all PCI bus devices. See below (P4) for resolution information. S-Series and E-Series (P4) nodes: --- SMI: smm_inb(0x3a) == 0x02 GPE 9 triggered S_PERR (GEVENT8) This error indicates one or more of the system's chipset is asserting S_PERR (secondary side parity error). Resolution: A) Identify and replace failing PCI card based on error output. It may be necessary to contact hardware engineering with BIOS output to determine which PCI slot is at fault. B) Remove all PCI cards. C) Replace the node motherboard. T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[9]: CPU0 IERR_L This error indicates that CPU 0 has asserted IERR#, which is used to indicate a processor internal error event occurred. The Intel documentation indicates one cause of this error is a machine check exception when exceptions have not yet been enabled. From our experience in the field, the problem is possibly a CPU or node motherboard failure. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove all PCI cards. D) Replace the node motherboard. Diagnostic: A) Replace CPUs. B) Replace CPU VRMs. Table Continued Error codes—HPE 3PAR OS 3.2.2 305 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 37, subcode 0x400 (0) Description GEVENT_TRIGGERED"GEvent Triggered" T-Series, F-Series, V-Series (5000P) nodes: *** Error: GPE[10]: CPU1 IERR_L This error indicates that CPU 1 has asserted IERR#, which is used to indicate a processor internal error event occurred. See Code 37, sub-code 0x200 for resolution information. Chimera nodes: *** Error: GPE[10]: CPU1_THERMTRIP_L This indicates a thermal event on CPU1 triggered a GPIO interrupt. It is a fatal condition and the node will be immediately taken out of the cluster with this fatal error. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. Fatal error: Code 37, subcode 0x800 (0) GEVENT_TRIGGERED"GEvent Triggered" Chimera nodes: *** Error: GPE[11]: CPU0_THERMTRIP_L This indicates a thermal event on CPU0 triggered a GPIO interrupt. It is a fatal condition and the node will be immediately taken out of the cluster with this fatal error. Resolution: A) Cycle power on the node. If it is a temperature related problem, verify the system is getting adequate ventilation. B) Replace the node motherboard. Eos and Tornado nodes: *** Error: GPE[11]: THERMTRIP_L This indicates a thermal event on the CPU triggered a GPIO interrupt. See the above information regarding Chimera nodes for resolution. Table Continued 306 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 37, subcode 0x2000 (0) GEVENT_TRIGGERED"GEvent Triggered" Eos, Tornado and Chimera nodes: *** Error: GPE[13]: CAT_ERR_L This error indicates that a CPU has asserted IERR#, which is used to indicate a processor internal error event occurred. The Intel documentation indicates one cause of this error is a machine check exception when exceptions have not yet been enabled. From our experience in the field, the problem is possibly a CPU or node motherboard failure. Resolution: A) Cycle power on the node. B) Verify the system is getting adequate ventilation. C) Remove all PCI cards. D) Replace the node motherboard. Diagnostic: A) Replace CPUs. B) Replace CPU VRMs. Non-fatal error: Code 38, sub-code 0x0 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power Supply xx indicates invalid battery configuration: y batteries Verify battery connection and individual battery units. The maximum count of batteries in a string which are supported by software is 3. Any greater number will result in this non-fatal error. The data value may be decoded to determine which power supply and the battery count.The high 8 bits are a bitmask of the power supply. The lower 16 bits are the number of batteries counted. Thus, a data value of 100000c indicates PS1 had a battery count of 12. A data value of 4 indicates PS0 had a battery count of 4. Resolution: A) Verify no more than 3 batteries in a string are connected to any one power supply. B) Cycle power on the node. C) Remove batteries one at a time to determine if there is a faulty connection or battery. Replace the faulty cable or battery. D) Replace the power supply. E) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 307 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 38, sub-code 0x1 (0) POWER_SUPPLY_FAILURE"Power Supply Failure" RTC / NVRAM Battery Failure - Replace battery. The RTC / NVRAM battery was found to have a low voltage by the built-in monitoring circuit of the RTC (TOD clock). Resolution: A) Replace the lithium-ion cell battery on the node. B) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x3 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" No batteries present on power supply xx This error indicates no batteries were found on a node power supply. This warning may be enabled by setting "warn_nobat" in NVRAM. The data value may be decoded to determine which power supply triggered this error. The high 8 bits are a bitmask of the power supply. Thus, a data value of 0 indicates PS0 is not present. A data value of 1000000 indicates PS1 is not present. Resolution: A) Verify there is at least one battery connected. B) Cycle power on the node. C) Exchange cables and batteries. D) Replace the power supply. E) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x4 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply missing: node power configuration is not redundant This error indicates one of the two power supplies for a node is not present.This warning may be enabled by setting "warn_ps" in NVRAM. The data value may be decoded to determine which power supply triggered this error. The high 8 bits are a bitmask of the power supply. Thus, a data value of 0 indicates PS0 is not present. A data value of 1000000 indicates PS1 is not present. Resolution: A) Verify both power supplies are present and powered on. B) Power off the missing supply, remove it, and re-insert it in the chassis. C) Replace the power supply. D) Replace the node motherboard. Table Continued 308 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 38, sub-code 0x5 (0) POWER_SUPPLY_FAILURE"Power Supply Failure" Battery failure on Power Supply This error indicates that a battery on the power supply has reported a hardware error.The status light on the back of the failed battery will be amber. Resolution: A) Verify both power supplies are present and powered on.Verify batteries are present and powered on. B) Power off the failed battery, remove the cable, and re-insert it in the Power Supply. Turn it back on. If that does not reset the FAILED condition, replace the battery. C) Replace the power supply. D) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x6 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Powering off PSxx because it is on battery power. This will shut down the node until AC is restored. This message indicates that a power supply lost input AC Power and that the BIOS powered down the node to avoid draining the battery. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the DC power status. Bit 0 represents power supply 0 and Bit 1 represents power supply 1. If this bit is 1, then the DC output from the power supply was good when the system shut down. Resolution: A) Apply AC power to the node. B) Replace the power supply. Table Continued Error codes—HPE 3PAR OS 3.2.2 309 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 38, sub-code 0x7 (data) Description POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: Fan Bad or Power supply xx failure: Fan 0 Bad or Power supply xx failure: Fan 1 Bad This error indicates there is a hardware problem in one of the node power supplies. One or more of the fans may have failed. The data value may be decoded to determine which power supply (and fan) triggered this error. The low 2 bits are a bitmask of the fan status for Power Supply 0.The next 2 bits are a bitmask of the fan status for Power Supply 1. Thus: 1: PS0 had a Fan0 failure 2: PS0 had a Fan1 failure 3: PS0 had a double fan failure c: PS1 had a double fan failure 4: PS1 had a Fan0 failure 8: PS1 had a Fan 1 failure Resolution: A) Replace the power supply. B) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0x8 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: Charger Overload This error indicates there is a hardware problem in one of the node power supplies, specifically that the charger cannot handle the battery charge current draw. If you need to override this error so the node continues, you can set "ignore_chargefail" in NVRAM. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the charger status for the two power supplies.This a value of 1 indicates PS0 had a charger overload. A value of 2 indicates PS1 had a charger overload. A value of 3 indicates PS0 and PS1 both had a charger overload. Resolution: A) Check battery connection. B) Exchange cables and batteries. C) Replace the power supply. D) Replace the node motherboard. Table Continued 310 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 38, subcode 0x9 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Both Power Supplies failed: DC Output Bad This error indicates there is a hardware problem in one of the node power supplies. If this failure is transient, it could also be caused by turning the power supply off and then on or by a quick AC loss followed by AC being restored. If both power supplies fail simultaneously (not likely), this is a fatal error. The data value may be decoded to determine which power supply triggered this error. The low 2 bits are a bitmask of the DC Output status for the two power supplies. As a Fatal error, the value will be 3, indicating PS0 and PS1 both had a DC Output Bad. Resolution: A) Ensure a service operation was not taking place at the time, and that AC had not also failed. B) Replace the power supply. C) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 311 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 38, sub-code 0xa (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Power supply xx failure: AC Input Bad This error indicates that AC input power is not being supplied to one or more power supplies. The likely cause is either a real AC Failure or that the power supply has been switched to the off position. In the case of an AC Failure, the power supply will be automatically shut down to preserve batteries (if "ignore_acfail" is set then the power supply will not be shut down). The lower 2 bits of the data value may be decoded to determine which power supply lost AC power. A value of 1 indicates PS0. A value of 2 indicates PS1. A value of 3 indicates both power supplies lost AC power. Resolution: A) Verify AC power is present and the power supply switch is turned on. B) Check the Power Distribution Unit (PDU) breaker. C) Replace the power supply. D) Replace the node motherboard. Non-fatal error: Code 38, sub-code 0xb (0) POWER_SUPPLY_FAILURE"Power Supply Failure" **** Power Supplies mismatch **** Power Supply 0: I2C accessible Power Supply 1: I2C inaccessible This error indicates one of the power supplies is a new style (I2C interface) and the other power supply is not responding using I2C, but has been detected as present. This is not a supported configuration. If you need to override this error, set "ignore_psdiff" in NVRAM. Resolution: A) Pull and re-insert the inaccessible power supply. B) Check the Power Distribution Unit (PDU) breaker for the inaccessible power supply. C) Replace the power supply. D) Replace the node motherboard. Table Continued 312 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 38, sub-code 0xc (data) POWER_SUPPLY_FAILURE"Power Supply Failure" This error indicates Power Supply 0 reported a limit was exceeded while performing the power supply status test.Each power supply has integrated monitors for temperature, voltage, and current draw. The BIOS reads these sensors as part of initialization to determine if the power supply is operating within specifications. The data value may be decoded to determine the particular cause of the limit failure. Each bit represents a unique sensor. Data values may be decoded as follows: 00000001 - Temperature 00000004 - 3.3V 00000008 - 3.3V Current 00000010 - 5V 00000020 - 5V Current 00000040 - 12V 00000080 - 12V Current 00000100 - 24V 00000200 - 24V Current 00000400 - 48V 00000800 - 48V Current 00001000 - Bat0 48V 00002000 - Bat1 48V 00004000 - Bat2 48V 00008000 - Bat0 12V 00010000 - Undefined ... to ... 00400000 - Undefined 00800000 - Battery LED is Amber 01000000 - Battery Relay is Off 02000000 - PS LED is Amber 04000000 - Fan Fail 08000000 - DC Fail 10000000 - AC Fail 20000000 - Power Supply is Disabled 40000000 - Power Supply Switch is Off 80000000 - Low Limit exceeded (combined with bits above) Resolution: Contact 3PAR technical support. Non-fatal error: Code 38, sub-code 0xd (data) POWER_SUPPLY_FAILURE"Power Supply Failure" This error indicates Power Supply 1 reported a limit was exceeded while performing the power supply status test. See Code 38, sub-code 0xc for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 313 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 38, sub-code 0xe (data) POWER_SUPPLY_FAILURE"Power Supply Failure" Each newer generation (Magnetek) power supply and battery has an I2C interface which allows the node to acquire power supply internal temperature, voltages, and current loads.The BIOS will verify these readings are within acceptable limits as part of normal initialization. This failure code indicates a limit has been exceeded on a battery attached to a power supply on the node. The data value may be decoded to determine which power supply and battery. The lower 2 bits are a bitmask of the power supply. The upper 16 bits are a bitmask of the failing battery. Thus, a data value of 10002 indicates PS1 Bat0 has exceeded a limit.A data value of 40001 indicates PS0 Bat2 has exceeded a limit. Resolution: A) Check battery expiration date and replace as necessary. B) Power cycle the failing battery. C) Replace battery cable. Diagnostic: A) Use the Whack "bat status" command to display power supply and battery temperatures and voltages to determine the particular failure. Non-fatal error: Code 38, sub-code 0xf (data) POWER_SUPPLY_FAILURE"Power Supply Failure" I2C errors prevented completion of the power test. Each newer generation (Magnetek) power supply and battery has an I2C interface which allows the node to acquire power supply status. This failure codes indicates the BIOS was unable to read one of the Power Supply or battery status registers. The lower 2 bits of the data value may be decoded to determine which power supply failed.A value of 1 indicates PS0. A value of 2 indicates PS1. A value of 3 indicates both power supplies failed. Resolution: A) Power cycle the indicated power supply. B) Replace power supply. C) Replace all attached batteries to the power supply. D) Replace the node motherboard. Table Continued 314 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 38, sub-code 0x10 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" PSwwww Batxxxx Switch Off This failure code indicates a battery has its power switch in the off position, and is thus unable to supply back up power to the node in the case of AC Failure. The data value may be decoded to determine which power supply and battery. See Code 38, subcode 0xd for decoding information. Resolution: A) Turn battery on. B) Power cycle the indicated battery. C) Replace battery cable. D) Replace power supply. Fatal error: Code 38, subcode 0x11 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" PS x has down-rev firmware (x) This failure code indicates the power supply firmware revision is not up-to-date and therefore not supported. Resolution: Replace power supply. Fatal error: Code 38, subcode 0x12 (data) POWER_SUPPLY_FAILURE"Power Supply Failure" PS x Battery has down-rev firmware (rev) This failure code indicates the battery attached to the power supply indicated has firmware that is not up-to-date and therefore not supported. Resolution: Replace battery. Table Continued Error codes—HPE 3PAR OS 3.2.2 315 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 39, subcode 0x1 (0) Description OS_STARTUP_FAILURE "OS Startup Error" Maximum count for no successful OS boot (xxxx) exceeded. Type "unset cnt_no_os_boot" to clear this error This error indicates that the BIOS has detected that the node has not successfully booted the OS and will now prohibit boots until operator intervention clears this error. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_no_os_boot" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Fatal error: Code 39, subcode 0x2 (0) OS_STARTUP_FAILURE "OS Startup Error" Maximum count for OS boot with no cluster (xxxx) exceeded. Type "unset cnt_no_cluster" to clear this error This error indicates that the BIOS has detected that the node has booted, but the cluster has not successfully formed several times.The BIOS will prohibit boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_no_cluster" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Table Continued 316 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 39, subcode 0x3 (0) Description OS_STARTUP_FAILURE "OS Startup Error" Maximum count for OS panic (xxxx) exceeded. Type "unset cnt_os_panic" to clear this error This error indicates that the BIOS has detected that the node has booted and then caused a panic several times. When the OS causes a panic, it notifies the BIOS of this event, so the BIOS can track problems.Once a limit is exceeded, the BIOS will prohibit boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_os_panic" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Fatal error: Code 39, subcode 0x4 (0) OS_STARTUP_FAILURE "OS Startup Error" Maximum count for OS cluster without shutdown (xxxx) exceeded. Type "unset cnt_no_shutdown" to clear this error This error indicates that the BIOS has detected that the node has booted, but has not been shut down properly several times.The BIOS will prohibit boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_no_shutdown" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Table Continued Error codes—HPE 3PAR OS 3.2.2 317 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 39, subcode 0x5 (0) Description OS_STARTUP_FAILURE "OS Startup Error" Maximum count for same fatal error (xxxx) exceeded. Type "unset cnt_same_fatal" to clear this error This error indicates that the BIOS has detected that the same fatal or non-fatal error has occurred repeatedly. The BIOS will prohibit boots until operator intervention clears this error.This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Observe other errors present in the PROM log to determine the cause of this error. B) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_same_fatal" at a Whack prompt. B) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. C) Replace the IDE drive. Fatal error: Code 39, subcode 0x6 (0) OS_STARTUP_FAILURE "OS Startup Error" Maximum count for errors logged (xxxx) exceeded. Type "unset cnt_log_error" to clear this error This error indicates that the BIOS has detected that it has recorded too many fatal or non-fatal errors in the board serial PROM and that it should prohibit further boots until operator intervention clears this error. This is to prevent cyclic node up/down caused by a hardware or software failure. This increases the reliability of the cluster by preventing the node from continuously attempting to join the cluster. Resolution: A) Observe other errors present in the PROM log to determine the cause of this error. B) Clear this error as suggested in the error text. You may also turn off this checking mechanism if it does not meet your application. To do this, type, "unset max_log_error" at a Whack prompt. C) Verify that a valid operating system image is installed on the node's internal disk. Reinstall the operating system if defective. D) Replace the IDE drive. Table Continued 318 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 39, subcode 0x7 (0) OS_STARTUP_FAILURE "OS Startup Error" This node hit the Harrier Mismatch Error, failed ECC This error indicates that the BIOS has detected that the Harrier ASIC's ECC logic has hit an error.It should have triggered an ECC error, but failed to do so. Resolution: Replace the node motherboard. Fatal error: Code 39, subcode 0x10 (0) OS_STARTUP_FAILURE "OS Startup Error" Invalid boot sector. Use "boot net install" to correct this. The IDE disk is used for booting the operating system. This error indicates the boot sector which has been loaded from the disk does not have a valid signature. The most likely cause of this error is that a fresh IDE drive has been installed in the node and it needs to be field net installed. Disk MBR does not have a valid partition table You may also see the above line immediately following the fatal error. This message indicates the partition table in the boot sector (Master Boot Record) was also invalid, and that a "ide log" entry could not be written. Resolution: A) If no hardware has been replaced, first try cycling power on the node. B) Perform a field IDE net install on the drive, or use "boot net install". C) Use the "ide smart status" to acquire the drive SMART status. Replace the IDE drive if a failure is reported. C) Replace the IDE cable. D) Replace the IDE drive. E) Replace the node motherboard. Table Continued Error codes—HPE 3PAR OS 3.2.2 319 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 40, sub-code 0x1 (0) CBIOS_OS_TIMEOUT"CBIOS to OS timeout" *** Error: CBIOS to OS message communication timeout During CPU SMI initialization, the queue facility to send messages between the BIOS and TPD is tested. If there is a problem triggering an SMI, or some other error which causes message corruption, this error will result.This error is recoverable because the OS can still come up and function at a degraded level even if the communication between the OS and BIOS is not functioning. Resolution: A) View prom log to see if this is repeatable. If not, ignore a single occurrence. B) Cycle power on the node. C) Replace the bootstrap CPU. D) Replace the node motherboard. Fatal error: Code 41, subcode 0x0 (0) CPU_BUS_SPEED_BAD "CPU Bus Speed Bad" *** Error: CPU speed is too slow. The computed CPU speed is lower than the expected minimum supported in a 3PAR node. Most likely this is due to a hardware failure. Since the CPU speed computation depends upon access to the RTC, it is most likely there is a communication problem with the SuperIO containing the RTC. If you need to run with a reduced CPU speed, enter the following command on the node: Whack> set perm cpu_slow_ok See Code 41, sub-code 0x0 for resolution information. Fatal error: Code 41, subcode 0x1 (0) CPU_BUS_SPEED_BAD "CPU Bus Speed Bad" *** Error: Memory speed is too slow. After the CPU speed is computed, the memory bus (FSB) speed is computed.It is computed based on the CPU speed, and bus speed multiplier as reported by the CPU. If you need to run with a reduced Memory bus speed, enter the following command on the node: Whack> set perm mem_slow_ok Resolution: A) Cycle power on the node. B) Replace the bootstrap CPU. C) Replace the node motherboard. Diagnostic: A) Resume past fatal error and look for additional problems such as RTC failure. Table Continued 320 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 42, subcode 0x1 (0) Description CP_I2C_FAILURE "Centerpanel I2C Failure" Failed CP PROM ww.xx.yy.zz read Centerpanel access using Manufacturing PROM: FAILURE The centerpanel is used by the 3PAR cluster for the nodes to communicate. The CM links and backup serial links serve this purpose. There is also a diagnostic I2C bus present in the centerpanel which is used by nodes to diagnose error conditions and reset other nodes in the cluster. As part of the manufacturing process, this bus is tested by accessing the serial PROM which is present on a manufacturing centerpanel. If this test fails, it is likely the node will have a problem accessing the centerpanel I2C bus. Resolution: A) B) Replace the Diagnostic: A) such as the Fatal error: Code 42, subcode 0x2 (0) Cycle power on the node. node motherboard. Use the Whack "i2c" command to access devices board register directly. CP_I2C_FAILURE "Centerpanel I2C Failure" Failed CP PROM ww.xx.yy.zz write Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Fatal error: Code 42, subcode 0x3 (0) CP_I2C_FAILURE "Centerpanel I2C Failure" CP PROM node data does not match what is written: Addr xxxx Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Fatal error: Code 42, subcode 0x4 (0) CP_I2C_FAILURE "Centerpanel I2C Failure" CP PROM pattern data read is incorrect Addr xx Expected yy Read zz ... Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Fatal error: Code 42, subcode 0x5 (0) CP_I2C_FAILURE "Centerpanel I2C Failure" Failed I2C access to board register x.y.z Centerpanel access using Manufacturing PROM: FAILURE See Code 42, sub-code 0x1 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 321 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 42, subcode 0x6 (0) Description CP_I2C_FAILURE "Centerpanel I2C Failure" Failed I2C access to board register x.y.z Centerpanel access using Manufacturing PROM: FAILURE Titan specific. It does read accessibility check for extra I2C addresses while testing CP PROM 0.a0 and fails with fatal error message if the address is not accessible. Note that if the failure is not related to CP PROM 0.a0, it will not print "CP PROM at 0.a0:" message and only "Failed I2C access to board register x.yy". See Code 42, sub-code 0x1 for resolution information. Table Continued 322 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 43, subcode 0x0 (data) Description CPU_PRESENCE_FAILURE"CPU Presence Failure" Voltage ID indicates CPUxx present but TEMP sensor disagrees. This error indicates either a CPU failure or onboard sensors are reading incorrect values for the specified CPU. The VID (voltage ID sense) lines are attached to each physical CPU and used to indicate to the VRMs (voltage regulator modules) the voltage level expected by the CPU.These lines are also connected to the LM87 which use this to determine the correct voltage which should be delivered to the CPU. The TEMP (temperature) sensor is connected to an on-die CPU thermal diode.If its reading is out of acceptable range, the BIOS determines the sensor is not reliably connected to a CPU, or a CPU is not present. Bits 0-1 of data indicate CPU non-presence as determined by the VID sense lines. Bits 8-9 of data indicate CPU nonpresence as determined by connection to the thermal diode. Data ValueFailure ------------------------------------------------------------1CPU0 does not respond to startup 2CPU1 does not respond to startup 10CPU0 thermal sensor/voltage ID indicates not present 20CPU1 thermal sensor/voltage ID indicates not present Resolution: A) Cycle power on the node. B) Remove physical CPU from specific socket and test with no CPU present. B1) If error persists, replace node motherboard. B2) If error clears, replace CPU. C) Replace the node motherboard. Diagnostic: A) Use "i2c env" command to determine whether the temperature or voltage is at fault. B) If CPU temperature shows out of range, and CPU is still functional, suspect thermal diode connection to LM87.Try swapping CPUs to see if problem moves with CPU. C) If CPU voltage shows high or low, but VRM is emitting correct voltage by the voltage sensor, then suspect the VID lines to the LM87. Fatal error: Code 43, subcode 0x1 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Voltage ID indicates CPUxx not present but TEMP sensor disagrees. See Code 43, sub-code 0x0 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 323 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 43, subcode 0x2 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Physical CPUxx active, but thermal sensor disagrees Bits 0-1 of data indicate CPU non-presence as determined by the running CPU APIC addresses.Bits 8-9 of data indicate CPU non-presence as determined by connection to the thermal diode. See Code 43, sub-code 0x0 for resolution information. Fatal error: Code 43, subcode 0x3 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Physical CPUxx not active, but thermal sensor disagrees Bits 0-1 of data indicate CPU non-presence as determined by the running CPU APIC addresses.Bits 8-9 of data indicate CPU non-presence as determined by connection to the thermal diode. See Code 43, sub-code 0x0 for resolution information. Fatal error: Code 43, subcode 0x4 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Not all hyper-threads started on physical CPUxx Bits 0-1 of data physical CPU0 as addresses. Bits 2-3 of data physical CPU1 as addresses. indicate logical CPU non-presence in determined by the running CPU APIC indicate logical CPU non-presence in determined by the running CPU APIC See Code 43, sub-code 0x0 for resolution information. Fatal error: Code 43, subcode 0x5 (data) CPU_PRESENCE_FAILURE"CPU Presence Failure" Not all cores started on physical CPUxx Bits 0-3 of data physical CPU0 as addresses. Bits 4-7 of data physical CPU1 as addresses. indicate logical CPU non-presence in determined by the running CPU APIC indicate logical CPU non-presence in determined by the running CPU APIC See Code 43, sub-code 0x0 for resolution information. Table Continued 324 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 43, subcode 0x10 (xx) CPU_PRESENCE_FAILURE"CPU Presence Failure" CMIC heatsink disconnected: yy The GPIOs reporting proper connection of the CMIC (North Bridge) heatsink report a loss of connection. This is a board failure which requires a lab technician to reattach the heatsink. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Diagnostic: A) Visually inspect the CMIC heatsink posts to determine if it needs to be reattached. B) Observe the reported xx value to see if it is one or both GPIOs which report the failure. These lines may be traced from VSC055 GPIOs P3.1 (J1300) and P3.2 (J1301). C) The BIOS flag "ignore_hsfail" may be set to override checking the CMIC heatsink. Non-fatal error: Code 44, sub-code 0x00 (xx) NODE_FAN_FAILURE"System Fan Failure" *** Error: One of the node fans is not present, failed, or is unintentionally running at a slower speed than expected. The VSC055 reports tachometer inputs for both node fans, 0 and 1. This is a single node fan failure which requires the fan to be replaced. Resolution: A) Cycle power on the node. B) Replace the node fan. Diagnostic: A) Visually inspect the node fan. B) Observe the fan is present and connected properly. C) If it was misconnected, correct the connection. Otherwise, the fan needs to be replaced. Fatal error: Code 44, subcode 0x01 (xx) NODE_FAN_FAILURE"System Fan Failure" *** Error: Both of the node fans are not present, failed, or are unintentionally running at speeds slower than expected. The VSC055 reports tachometer inputs for both node fans, 0 and 1. This is a dual node fan failure which requires both of the fans to be replaced. The system may overheat. Resolution: A) Cycle power on both nodes. B) Replace both node fans. Diagnostic: A) Visually inspect the node fans. B) Observe the fans are present and connected properly. C) If they are misconnected, correct the connections. Otherwise, the fans need to be replaced. Table Continued Error codes—HPE 3PAR OS 3.2.2 325 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 45, subcode 0x00 (data) QLIS_ISCSI_FAILURE "QLogic iSCSI Failure" *** Error: QLogic iSCSI Failure This error code indicates an error while running the QLogic iSCSI POST. Failed Test (bits 8-15), Slot (bits 4-7) and Port (bits 0-3) are packed into data. Failed Test is one of the following: <QLogic internal card diagnostics> 2 3 4 5 6 7 8 9 240 241 242 243 244 Test Local RAM Size Test Local RAM R/W Test RISC RAM Test NVRAM Test Flash ROM Test Network Internal Loopback Test Network External Loopback Test DMA Transfer (0xf0) Test NOP (0xf1) Test Registers (0xf2) Test DMA Transfer to CPU memory (0xf3) Test DMA Transfer to Cluster memory (0xf4) Card Initialization Resolution: A) Cycle power on failing node. B) Re-seat failing iSCSI card C) Replace failing iSCSI card Fatal error: Code 46, subcode 0x1 (0) BAD_OR_UNKNOWN_CHIPSET "Bad or Unknown Chipset" *** Error: Unrecognized chipset (0xXXXXXXXX). This error code indicates CBIOS does not recognize the chipset installed on the node's motherboard. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Non-fatal error: Code 46, sub-code 0x2 (0) BAD_OR_UNKNOWN_CHIPSET "Bad or Unknown Chipset" *** ME not in operational mode. IPMI data unavailable. This error code indicates that the PCH Management Engine is not in the desired operational mode in PCH chipset. IPMI temperature data is not available in this mode and the systems fans may not run at the proper speed and may not cool the enclosure. Resolution: Contact engineering with data. Table Continued 326 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 47, subcode 0x00 (0) SDRAM_UNAVAILABLE"Control Cache Unavailable" No CPU SDRAM is available. This error indicates that CBIOS has no working CPU memory available for it to continue with POST and ultimately boot the node. Resolution: A) Cycle power on the node. B) Replace CPU DIMMs. C) Replace the node motherboard. Fatal error: Code 48, subcode 0x0 (XXXXXXXX) UNKNOWN_BOARD "Unknown Board" *** Error: Unrecognized board identifier (0xXXXXXXXX). This error code indicates CBIOS does not recognize the board type for the chipset installed on the node's motherboard. Resolution: A) Cycle power on the node. B) Replace the node motherboard. Fatal error: Code 49, subcode 0x1 (data) USB_FAILURE "USB Flash Media Failure" Failed to find USB device handle or Inquiry Request Failed rc = xxxx The USB controller failed to perform a self test. A data value of 0 indicates the BIOS failed to find a USB handle. Resolution: A) If a USB Flash drive is not expected to be present, set the "usb_nodevice_ok" NVRAM variable to override BIOS requiring a USB Flash drive be found. B) Replace the USB Flash drive. C) Replace the node motherboard. Diagnostic: A) Whack "usb test" commands may be used to individually execute USB tests. Fatal error: Code 49, subcode 0x4 (0) USB_FAILURE "USB Flash Media Failure" There was a USB failure in data requested by the operating system bootstrap. It is possible that data on the disk has become corrupt to the point the operating system will not successfully load. Resolution: Reinstall the operating system bootstrap with the "boot net install" command. Table Continued Error codes—HPE 3PAR OS 3.2.2 327 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 49, subcode 0x6 (0) USB_FAILURE "USB Flash Media Failure" USB reported a failure in the read verify command. See Code 49, sub-code 0x1 for resolution information. Fatal error: Code 49, subcode 0x7 (0) USB_FAILURE "USB Flash Media Failure" USB reported a failure in the write verify command. See Code 49, sub-code 0x1 for resolution information. Non-fatal error: Code 49, sub-code 0x17 (0) USB_FAILURE "USB Flash Media Failure" No USB device was found. Resolution: Install or replace the USB Flash drive. Fatal error: Code 50, subcode 0x1 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Invalid control cache setup. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x2 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Incompatible FB-DIMM installed. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x3 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Electrically isolated FB-DIMM. Resolution: A) Replace DIMM. B) Replace node. Fatal error: Code 50, subcode 0x4 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Incompactible module installed. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x5 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Mismatched DIMM pair. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x6 (<DIMM>) SDRAM_INIT_WARNING Odd rank disabled. "Control Cache Init Failure" Resolution: Replace DIMM. Table Continued 328 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 50, subcode 0x7 (0) Description SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM branch failed to train and lockstep mode has been disabled. Resolution: A) Replace all DIMMs. B) Replace node. Fatal error: Code 50, subcode 0x9 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM northbound merge has been disabled. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0xa (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM disabled due to lockstep skew. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0xb (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM rank disabled due to Built-in Self Test failure. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0xe (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Memory interleave range limit invalid. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0xf (0) SDRAM_INIT_WARNING "Control Cache Init Failure" High temp disabled. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x10 (<DIMM>) SDRAM_INIT_WARNING "Control Cache Init Failure" Logical rank with CECC detected. Resolution: Replace DIMM. Fatal error: Code 50, subcode 0x12 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Sub-optimal FB-DIMM channel population detected. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x13 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Mismatched AMB pair. Resolution: Replace all DIMMs. Table Continued Error codes—HPE 3PAR OS 3.2.2 329 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 50, subcode 0x14 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM branch disabled. Resolution: A) Replace all DIMMs. B) Replace node. Fatal error: Code 50, subcode 0x15 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" FB-DIMM thermal throttling has been disabled. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x16 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" Last FB-DIMM AMB has been disabled. Resolution: Contact 3PAR technical support. Fatal error: Code 50, subcode 0x17 (0) SDRAM_INIT_WARNING "Control Cache Init Failure" The FB-DIMM memory branches do not match in size. Resolution: Contact 3PAR technical support. Fatal error: Code 51, subcode 0x1 (Data) CMA_BIST_FAILURE "CM ASIC Cache BIST Failure" The BIST (Built-in Self Test) in Harrier reported either a BAD value or a different value from what was recorded in the node PROM during MFG board assembly. (Data = Harrier BIST result) Resolution: Replace the node. Note for OPS that Harrier BIST failed, and that the PROM should not be wiped. Non-fatal error: Code 51, sub-code 0x2 (Data) CMA_BIST_FAILURE "CM ASIC BIST Failure" During Harrier initialization, the CMA BIST test failed but due to some other (e.g. I2C I/O error) reason. This error codes indicates that the BIST test itself hasn't failed but there was an error which occured either during book-keeping (PROM0 read/write) or the test was not performed at all because it failed to read a Harrier register. (Data = 0x2f) Resolution: Monitor and replace the node if the issue recurs. If the node is replaced, note for OPS that they should verify I2C to the node PROM is functional. Table Continued 330 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 52, sub-code 0x0 Description CPU_PM_FAILURE"CPU Power Management Failure" One or more bits in CPU's 2 General Power Management registers were set due to abnormal power reset. The set bits are printed describing the cause. Resolution: Contact engineering with data. Non-fatal error: Code 52, sub-code 0x1 CPU_PM_FAILURE"CPU TSC is over 48-bits" CPU Timestamp counter is too big after reset. There could be CPU reset issue. Resolution: Contact engineering with data. Fatal error: Code 53, subcode 0x0 (xxxx) FPGA_FAILURE "FPGA Failure" The CPU was unable to communicate with the FPGA. Resolution: Replace node motherboard. Fatal error: Code 53, subcode 0x1 FPGA_FAILURE "FPGA Failure" FPGA revision in EOS node is old. FPGA upgrade is required. Resolution: Upgrade FPGA to the latest revision. Fatal error: Code 54, subcode 0x0 (xxyy) VRM_FAILURE "VRM Failure" A CPU VRM is missing. or A CPU VRM is not providing power. Resolution: A) Replace CPU VRM yy. B) Replace node motherboard. Fatal error: Code 55, subcode 0xzzzzzzzz (yyy) UEFI_PEI_FAILURE "UEFI Failure: PEI" UEFI failed to boot, failed during PEI due to assert. Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3 tree to determine filename of assert. yyy specifies line number (in hex). Resolution: Contact 3PAR technical support. Table Continued Error codes—HPE 3PAR OS 3.2.2 331 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 56, subcode 0xaabb (yyy) Description UEFI_MRC_FAILURE "UEFI Failure: Memory Training" UEFI failed to boot, failed during Intel MRC memory training code. aa specifies the major code, bb specifies the minor code * Major Code Table 0x30Correctable error during MRC memory training 0x31Uncorrectable error during MRC memory training 0xE8ERR_NO_MEMORY 0xE9ERR_LT_LOCK 0xEAERR_DDR_INIT 0xEBERR_MEM_TEST 0xECERR_VENDOR_SPECIFIC 0xEDERR_DIMM_COMPAT 0XEEERR_MRC_COMPATIBILITY 0xEFERR_MRC_STRUCT Resolution: Contact 3PAR technical support. Fatal error: Code 57, subcode 0xzzzzzzzz (yyy) UEFI_DXE_FAILURE "UEFI Failure: DXE" UEFI failed to boot, failed during DXE due to assert. Look-up zzzzzzzz in doc/udk_hash_index.csv of udk2010_up3 tree to determine filename of assert. yyy specifies line number (in hex). Resolution: Contact 3PAR technical support. Non-fatal error: Code 58, sub-code 0x0 HECI_FAILURE "HECI Interface Failure" CBIOS failed to obtain the ME firmware flash unlock code through the HECI interface. This could prevent flash commands from functioning. Resolution: Try rebooting the node. Fatal error: Code 59, subcode 0x00 (0) FAILSAFE_BIOS_BOOT"Failsafe Boot Halt" The EOS Failsafe BIOS has booted without detecting a CRC error in the Main BIOS indicating a HW initialization failure preventing the node from booting.The Failsafe BIOS has also detected five or more non-CRC failures causing boots to failsafe within the past two hours and has stopped attempting to recover automatically. Table Continued 332 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 59, sub-code 0x01 (bbxxyyzz) Description FAILSAFE_BIOS_BOOT"Failsafe Boot Mode" The EOS Failsafe BIOS has booted. This non-fatal entry is logged to mark the switch over from the Main BIOS to the Failsafe BIOS which may be a different version. The data field contains the build (bb) and version (xx.yy.zz) of the Failsafe BIOS that is booting. Non-fatal error: Code 59, sub-code 0x02 (flags) FAILSAFE_BIOS_BOOT"Failsafe Boot Mode" The EOS Failsafe BIOS has booted. This non-fatal entry is logged to mark the switch over from the Main BIOS to the Failsafe BIOS which may be a different version. The data field contains FPGA flags at the time of the boot. Bits 0..7- FSBC_STAT register from the FPGA. See FPGA design documentation for details. Bits 8..15- FPGA Revision register. Bits 16..23 - FPGA ID register. (=4 for EOS) Bit 24- Flag indicating state of env var qa_force_bios_to Bit 25- Flag indicating state of env var qa_force_fs_to Flag: 1=var is set, 0=var is not set. Bits 26..31 - Reserved, =0 Non-fatal error: Code 60, sub-code 0x00 (0) Fatal error: Code 60, subcode 0x01 (0) NEMOE_FAILURE "Nemoe Failure" The OKI Nemoe MCU has failed to boot within the specified timeout and the UEFI BIOS has reset the chip. This non-fatal entry is logged to record the boot failure and attempted restart by BIOS. No recovery action is required for this subcode. NEMOE_FAILURE "Nemoe Failure" The OKI Nemoe MCU has failed to boot within the specified timeout, The BIOS had eset the part, and it still faied to complete its boot initialization before a timeout. The only corrective action is to replace the node. Table Continued Error codes—HPE 3PAR OS 3.2.2 333 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 61, sub-code 0x00 (data) Description AC_POWER_LOSS "AC Power Loss" Turning off BBU because the node is on battery power. This will shut down the node until AC is restored. This message indicates that all power supplies lost input AC Power and that the BIOS powered down the node to avoid draining the battery. The data value provides a mask of power supplies which have AC good input but failed DC output. Resolution: A) Apply AC power to the node. B) Replace the power supplies. Fatal error: Code 62, subcode 0x00 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x00 indicates a general timeout or exhaustion of available retries during overall Write/Read/Gate leveling. The "path" value encodes the cma number, channel number, and chip-select map, according to the following bit range mapping: |31 24|2316|15 8|7 0| | cma number | channel number | | chip select map| Resolution: A) Cycle power on the node. B) Reseat CM memory riser card. C) Reseat the failing Cluster memory DIMM. D) Replace the failing Cluster memory DIMM. E) Replace the node motherboard. Fatal error: Code 62, subcode 0x01 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x01 indicates a timeout during write leveling, or an exhaustion of available retries during write leveling. The "path" value encodes the cma number, channel number, and chip-select number, according to the following bit range mapping: |31 24|2316|15 8|7 0| | cma number | channel number | | chip select| See Code 62, sub-code 0x00 for resolution information. Table Continued 334 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 62, subcode 0x02 (path) Description CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x02 indicates a timeout during read leveling, or an exhaustion of available retries during read leveling. The "path" value encodes the cma number, channel number, and chip-select number, according to the following bit range mapping: |31 24|2316|15 8|7 0| | cma number | channel number | | chip select| See Code 62, sub-code 0x00 for resolution information. Fatal error: Code 62, subcode 0x03 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x03 indicates a timeout writing to a Mosys PHY register. The "path" value encodes the channel number and the PHY CSR address, according to the following bit range mapping: |3116|15 0| | channel number | CSR address | See Code 62, sub-code 0x00 for resolution information Fatal error: Code 62, subcode 0x04 (path) CM_DDR3_LEVEL_FAILURE "Level Failure" This error code indicates a failure during Cluster Memory leveling. Sub-code 0x04 indicates a timeout reading from a Mosys PHY register. The "path" value encodes the channel number and the PHY CSR address, according to the following bit range mapping: |3116|15 0| | channel number | CSR address | See Code 62, sub-code 0x00 for resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 335 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 63, subcode 0x00 (xxyy) Description LRDIMM_COMM_FAILURE "LRDIMM Communication failure" xx = SMBUS address yy = SMBUS read failure status This error indicates a failure during a SMBUS read of an LRDIMM iMB register. Resolution: A) Use the Whack command line to re-run the CM initialzation test (cma init). B) Use the Whack command line to Reset node. C) Cycle power on the node. D) Reseat appropriate Cluster Memory DIMM. E) Replace appropriate Cluster Memory DIMM. F) Replace the node motherboard. Fatal error: Code 63, subcode 0x01 (xxyy) LRDIMM_COMM_FAILURE "LRDIMM Communication failure" xx = SMBUS address yy = SMBUS write failure status This error indicates a failure during a SMBUS write to an LRDIMM iMB register. Resolution: See Code 63 sub-code 0x00. Fatal error: Code 63, subcode 0x02 (xxxxyyzz) LRDIMM_COMM_FAILURE "LRDIMM iMB data mis-compare" xxxx = iMB register yy = Expected contents of register zz = Actual contents of register This is a Data MisCompare while verifying LRDIMM iMB register initial values. Resolution: See Code 63 sub-code 0x00. Fatal error: Code 64, subcode 0x00 (FFFF) PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR" Could not find the variable string in the environment variable table. Resolution: A) User needs to fix illegal name in the table or in call to table. Fatal error: Code 64, subcode 0x00 (PortNum) PCI_PHY_ERROR "PCI_PHY_SUBCODE_USER_ERROR" PortNum specifies the invalid port number. The user entered an incorrect RPC or LPC port number in the "CMA PPHY..." command. Resolution: A) User needs to enter the correct port number for the "CMA PPHY..." command. Table Continued 336 Error codes—HPE 3PAR OS 3.2.2 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Non-fatal error: Code 64, sub-code 0x01 (ack) Description PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_ACK_OUT_TIMEOUT" ack is the current state of the PPHY control register acknowledge bit. If ack is 0, the timeout occurred waiting for the ack bit to assert. If ack is 1, the timeout occurred waiting for the ack bit to deassert. Resolution: A) The "CMA PPHY..." commands are series of commands that allow the user to modify settings on PPHYs or run various tests. Therefore, the user should know what has changed and know whether or not any failures are real or a result of the changes that were made. If you feel the hardware is bad, continue, B) Use the Whack command line to Reset node. C) Cycle power on the node. D) Replace the node motherboard. Non-fatal error: Code 64, sub-code 0x02 (data) PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_RD_MISMATCH" data is the actual data value from the PPHY register that did not match the expected value. Resolution: A)See Code 64, sub-code 0x01 resolution. Fatal error: Code 64, subcode 0x03 (PortNum) PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_LBERT_ERROR" PortNum is the RPC port being tested. BERT is used to generate a pattern for the voltage margin test and the test is expected to generate a BERT error. This failure indicates the expected error occurred but did not clear or the expected error did not occur. Non-fatal error: Code 64, sub-code 0x04 (PortNum) PCI_PHY_ERROR "PCI_PHY_SUBCODE_PHY_PHASE_ERROR Total Eye Margin value is over 1 UI" PortNum is the RPC or LPC port being tested. UI is Unit Interval. Resolution: A) See Code 64, sub-code 0x01 resolution. Non-fatal error: Code 65, sub-code 0x00 (xxyy) BOOT_DISK_WARNING "Boot disk warnings" Booting with the default boot disk (xx) failed and the default boot disk was changed to next available boot disk (yy). Resolution: Check boot disk (xx) for any disk failure. Table Continued Error codes—HPE 3PAR OS 3.2.2 337 BIOS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Non-fatal error: Code 65, sub-code 0x01 (xxyy) BOOT_DISK_WARNING "Boot disk warnings" Booting with the default boot disk (xx) failed and next available boot disk (yy) has failed booting. Resolution: Check boot disk (xx) and (yy) for any disk failure Non-fatal error: Code 65, sub-code 0x02 (0) BOOT_DISK_WARNING "Boot disk warnings" Reading boot disk info from PROM failed and dual boot disk configuration was skipped. Resolution: Check PROM for any access failure. Status: Code 127 (STAT_BIOS_DIAG) "BIOS Diag" Status: Code 128 (STAT_BIOS_UPDATE) "BIOS Update" This code is not an error.It is a BIOS diagnostic failure which was forced by the Whack "fatal" command. It is used to test the error logging and reporting mechanisms of the BIOS and TPD software. This code is not an error.It indicates the BIOS determined that it had been updated. During CBIOS initialization, it looks at a value stored in NVRAM to determine if the current version is newer than the version previously booted. If so, the BIOS logs this update.The sub-code is the new BIOS version and the minor code is the old BIOS version. Example: Code 128 (BIOS update) - Subcode 0x10204 (10201) The above indicates CBIOS was updated from version 1.2.1 to 1.2.4. HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Error codes above 255 are in the domain of the OS. 338 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Fatal error: Code 257, sub-code yyyyyyyy (xx) Description PROM_EA_MEM_UERR "Uncorrectable Cluster memory" S-Series (PIII and P4) and E-Series (P4) nodes: Log: Uncorrectable Error in Cluster Memory Text: UERR wwwwwwww at cluster DIMM xx, addr yyyyyyyy, syn zzzzzzzz Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: eagle_err_interrupt() of eagleint.c This error indicates the Cluster Manager ASIC (Eagle) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. T-Series, F-Series, and V-Series (5000P) nodes: Log: Uncorrectable Error in Cluster Memory Text: CM UECC Error Status [wwwwwwww]: osp: UECC: address=yy:yyyyyyyy chnl 0xww seg 0xqq synd 0xrr bank=0xss col=0xtttt row=0xuuuu DIMMww.vv Multibit Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: eagle_err_interrupt() of eagleint.c This error indicates the Cluster Manager ASIC (Osprey) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yy:yyyyyyyy). The node is taken out of the cluster in response to this error. If only the xx value is available, the DIMM number may be computed as (xx % 3). (xx / 3). For example: if xx is 2, this would refer to DIMM2.0. Series based on the Harrier ASIC: Log: Uncorrectable Error in Cluster Memory Text: HAR0|1 MemCore0|1 MUERR|UERR IntStatus=wwwwwwww data xxxxxxxx:xxxxxxxx denali channel addr y:yyyyyyyyy syndrome z Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: harrier_err_interrupt() of harrierint.c This error indicates the Cluster Manager ASIC (Harrier) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. DIMM callout (xx) = 0 = DIMM0.0.0 Table Continued Error codes—HPE 3PAR OS 3.2.2 339 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description 1 2 3 4 5 6 7 = = = = = = = DIMM0.1.0 DIMM0.0.1 DIMM0.1.1 DIMM1.0.0 DIMM1.1.0 DIMM1.0.1 DIMM1.1.1 Series based on the Harrier2 ASIC: Log: Uncorrectable Error in Cluster Memory Text: HAR2 0|1 MemCore0|1 MUERR|UERR IntStatus=wwwwwwww data xxxxxxxx:xxxxxxxx DDR3 addr yy:yyyyyyyy syndrome z Event: Uncorrectable Memory Error Panic: Panic due to Uncorrectable Memory Error Where: harrier2_err_interrupt() of harrier2int.c This error indicates the Cluster Manager ASIC (Harrier2) has detected an uncorrectable memory error in one or more cluster memory DIMMs (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. See DIMM callout for Harrier ASIC. This event is usually followed by a core dump on disk. The kernel log text in the core dump usually contains some easy to interpret text which identifies which DIMM has failed. Resolution: A) Cycle power on the node. B) Reseat Cluster Memory riser card. C) Reseat the failing Cluster Memory DIMM(s). D) Replace the failing Cluster Memory DIMM(s). E) Replace the node motherboard. Diagnostic: A) Ensure BIOS tests are enabled using the "table skip none" command at a Whack prompt. B) Use "mem test cm" command to test cluster memory. C) wwwwwwww is the CM Error interrupt status register and the syndrome is zzzzzzzz. These may be decoded using scaffold documentation. Fatal error: Code 258, sub-code xx (yy) PROM_EA_MEM_CERR "Correctable Cluster memory" This error is not currently generated by a node. It is a placeholder should it be necessary to record correctable cluster memory ECC errors in the node PROM. Table Continued 340 Error codes—HPE 3PAR OS 3.2.2 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 259, sub-code xx (yy) PROM_EA_XCB_ERR "Error in the XCB engine" This error is not currently generated by a node. a placeholder for CM XCB engine hardware errors. Fatal error: Code 260, sub-code xx (yy) It is PROM_EA_MEM_MUERR "Multiple uncorrectable memory" Log: Multiple Uncorrectable Error in Cluster Memory Text: MUERR wwwwwwww at cluster DIMM xx, addr yyyyyyyy, syn zzzzzzzz Event: Multiple Uncorrectable Memory Error Panic: Panic due to Multiple Uncorrectable Memory Error Where: eagle_err_interrupt() of eagleint.c This error indicates the Cluster Manager ASIC (Eagle or Osprey) has detected multiple uncorrectable memory errors in cluster memory DIMM (xx) at address (yyyyyyyy). The node is taken out of the cluster in response to this error. See Code 257 for error resolution information. Table Continued Error codes—HPE 3PAR OS 3.2.2 341 HPE 3PAR OS fatal error codes and error resolution—HPE 3PAR OS 3.2.2 Code Description Fatal error: Code 261, sub-code xx (yy) PROM_EA_HW_ERR "Cluster Manager HW error" This error is not currently generated by a node. It is a placeholder for Cluster Manager (Eagle or Osprey) internal hardware errors. Fatal error: Code 262, sub-code xxxxxxxx (yy) PROM_EA_PCI_ERR "Cluster Manager PCI error" Log: Cluster Manager PCI Error Text: ea_pci_err: bus yy, status xxxxxxxx Call CBIOS to analyze error. ... Event: PCI bus yy error xxxxxxxx Panic: Panic due to Eagle PCI error: bus yy, status xxxxxxxx Where: ea_pci_err() of eaint_hdler.c This error indicates the Cluster Manager ASIC (Eagle or Osprey) has detected a PCI bus error while communicating with either a CPU or one of the PCI slot devices. This error is most likely caused by a card which has failed in one of the PCI slots. You may need to observe BIOS output which would be recorded in the crash dump in order to determine the true cause. Resolution: A) Cycle power on the node. B) Read BIOS output to determine if a specific PCI card is implicated by the slot bridges. If so, replace the card. C) Replace the node motherboard. Diagnostic: A) If BIOS messages indicate no other device is at fault, then manual BIOS tests may be performed to determine if the cause is CIOB. You may use the Whack "mem test" command with a CM memory range to generate accesses. Use "eagle status" and "eagle clear" to get and clear errors. B) The "fibre test cluster" command is good to test access from a fibre channel card to the CM. 342 Error codes—HPE 3PAR OS 3.2.2 Websites General websites Hewlett Packard Enterprise Information Library www.hpe.com/info/EIL Single Point of Connectivity Knowledge (SPOCK) Storage compatibility matrix www.hpe.com/storage/spock Storage white papers and analyst reports www.hpe.com/storage/whitepapers For additional websites, see Support and other resources. Websites 343 Support and other resources Accessing Hewlett Packard Enterprise Support • For live assistance, go to the Contact Hewlett Packard Enterprise Worldwide website: • http://www.hpe.com/assistance To access documentation and support services, go to the Hewlett Packard Enterprise Support Center website: http://www.hpe.com/support/hpesc Information to collect • • • • • • • • Technical support registration number (if applicable) Product name, model or version, and serial number Operating system name and version Firmware version Error messages Product-specific reports and logs Add-on products or components Third-party products or components Accessing updates • • Some software products provide a mechanism for accessing software updates through the product interface. Review your product documentation to identify the recommended software update method. To download product updates: Hewlett Packard Enterprise Support Center • • www.hpe.com/support/hpesc Hewlett Packard Enterprise Support Center: Software downloads www.hpe.com/support/downloads Software Depot www.hpe.com/support/softwaredepot To subscribe to eNewsletters and alerts: www.hpe.com/support/e-updates To view and update your entitlements, and to link your contracts and warranties with your profile, go to the Hewlett Packard Enterprise Support Center More Information on Access to Support Materials page: www.hpe.com/support/AccessToSupportMaterials IMPORTANT: Access to some updates might require product entitlement when accessed through the Hewlett Packard Enterprise Support Center. You must have an HPE Passport set up with relevant entitlements. 344 Support and other resources Customer self repair Hewlett Packard Enterprise customer self repair (CSR) programs allow you to repair your product. If a CSR part needs to be replaced, it will be shipped directly to you so that you can install it at your convenience. Some parts do not qualify for CSR. Your Hewlett Packard Enterprise authorized service provider will determine whether a repair can be accomplished by CSR. For more information about CSR, contact your local service provider or go to the CSR website: http://www.hpe.com/support/selfrepair Remote support Remote support is available with supported devices as part of your warranty or contractual support agreement. It provides intelligent event diagnosis, and automatic, secure submission of hardware event notifications to Hewlett Packard Enterprise, which will initiate a fast and accurate resolution based on your product's service level. Hewlett Packard Enterprise strongly recommends that you register your device for remote support. If your product includes additional remote support details, use search to locate that information. Remote support and Proactive Care information HPE Get Connected www.hpe.com/services/getconnected HPE Proactive Care services www.hpe.com/services/proactivecare HPE Proactive Care service: Supported products list www.hpe.com/services/proactivecaresupportedproducts HPE Proactive Care advanced service: Supported products list www.hpe.com/services/proactivecareadvancedsupportedproducts Proactive Care customer information Proactive Care central www.hpe.com/services/proactivecarecentral Proactive Care service activation www.hpe.com/services/proactivecarecentralgetstarted Warranty information To view the warranty for your product or to view the Safety and Compliance Information for Server, Storage, Power, Networking, and Rack Products reference document, go to the Enterprise Safety and Compliance website: www.hpe.com/support/Safety-Compliance-EnterpriseProducts Additional warranty information HPE ProLiant and x86 Servers and Options www.hpe.com/support/ProLiantServers-Warranties HPE Enterprise Servers www.hpe.com/support/EnterpriseServers-Warranties HPE Storage Products www.hpe.com/support/Storage-Warranties Customer self repair 345 HPE Networking Products www.hpe.com/support/Networking-Warranties Regulatory information To view the regulatory information for your product, view the Safety and Compliance Information for Server, Storage, Power, Networking, and Rack Products, available at the Hewlett Packard Enterprise Support Center: www.hpe.com/support/Safety-Compliance-EnterpriseProducts Additional regulatory information Hewlett Packard Enterprise is committed to providing our customers with information about the chemical substances in our products as needed to comply with legal requirements such as REACH (Regulation EC No 1907/2006 of the European Parliament and the Council). A chemical information report for this product can be found at: www.hpe.com/info/reach For Hewlett Packard Enterprise product environmental and safety information and compliance data, including RoHS and REACH, see: www.hpe.com/info/ecodata For Hewlett Packard Enterprise environmental information, including company programs, product recycling, and energy efficiency, see: www.hpe.com/info/environment Documentation feedback Hewlett Packard Enterprise is committed to providing documentation that meets your needs. To help us improve the documentation, send any errors, suggestions, or comments to Documentation Feedback (docsfeedback@hpe.com). When submitting your feedback, include the document title, part number, edition, and publication date located on the front cover of the document. For online help content, include the product name, product version, help edition, and publication date located on the legal notices page. 346 Regulatory information