Troubleshooting VMware High Availability (HA) Details You are

advertisement
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&external
Id=1001596
Troubleshooting VMware High Availability
(HA)
Details
You are experiencing:
 VMware High Availability (HA) failover errors:
HA agent on <server> in cluster <cluster> in <datacenter> has an
error
Insufficient resources to satisfy HA failover level on cluster
 HA agent configuration errors on ESX hosts:
 Failed to connect to host
 Failed to install the VirtualCenter agent
 cmd addnode failed for primary node: Internal AAM Error - agent
could not start
 cmd addnode failed for primary
node:/opt/vmware/aam/bin/ft_startup failed
 Configuration of hosts IP address is inconsistent on host <hostname> address resolved to
<IP> and <IP>
 Port errors:
Ports not freed after stop_ftbb
 The first node in the HA cluster enables correctly but the second node fails to configure HA
just after 90%.
 The network settings and HA configuration are all correct. DNS and ping tests are all okay.
 VMware Infrastructure (VI) Client displays the error:
Internal AAM Errors - agent could not start
 In the aam logs on the ESX, the file aam_config_util_addnode.log shows text similar
to:
11/27/09 16:20:49
11/27/09 16:20:49
2199
11/27/09 16:20:49
line 1168
11/27/09 16:20:49
line 171
11/27/09 16:20:49
Solution
[myexit ] Failure location:
[myexit ] function main::myexit called from line
[myexit ] function main::start_agent called from
[myexit ] function main::add_aam_node called from
[myexit ] VMwareresult=failure
This article guides you through the process of troubleshooting a VMware HA cluster. The article
identifies common configuration problems as well as confirming the availability of required resources
on your ESX Server.
Validate that each troubleshooting step below is true for your environment. Each step provides
instructions or a link to a document, in order to eliminate possible causes and take corrective action
as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and
identify the proper resolution. Please do not skip a step.
Note: If you perform a corrective action in any of the following steps, attempt to reconfigure VMware
HA again.
1. Check the release notes for current releases to see if the problem has been resolved in a bug
fix. See the Documentation page for vSphere 4 or VMware Infrastructure 3.
2. Verify that there are enough licenses to configure VMware HA. For more information,
see Verifying that the feature is licensed (1003692).
3. Verify that name resolution is correctly configured on the ESX Server. For more information,
see Identifying issues with and setting up name resolution on ESX Server (1003735).
4. Verify that name resolution is correctly configured on the vCenter Server. For more
information, see Configuring name resolution for VMware VirtualCenter (1003713).
5. Verify that the time is correct on all ESX Servers with the date command. For more
information on setting up time synchronization with ESX Server, see Installing and
Configuring NTP on VMware ESX Server (1339).
6. Verify that network connectivity exists from the VirtualCenter Server to the ESX Server. For
more information, seeTesting network connectivity with the Ping command (1003486).
7. Verify that network connectivity exists from the ESX Server to the isolation response address.
For more information, see Testing network connectivity with the Ping command (1003486).
8. Verify that all of the required network ports are open. For more information, see Testing port
connectivity with the Telnet command (1003487).
9. Determine if there is a cluster resource issue. For more information, see Advanced
Configuration options for VMware High Availability (1006421).
10. Verify that the correct version of the VirtualCenter agent service is installed. For more
information on determining agent versions and how to manually uninstall and reinstall the
HA agents on an ESX host, see Verifying and reinstalling the correct version of VMware
VirtualCenter Server agent (1003714).
11. Verify the VirtualCenter Server Service has been restarted. To restart the VirtualCenter
Server Service, seeStopping, starting, or restarting the vCenter Server service (1003895).
12. Verify that VMware HA is only attempting to configure on one Service Console. For more
information, see VMware High Availability configuration issues when an iSCSI Service
Console is on the same network (1003789).
13. Verify that the VMware HA cluster is not corrupted. To do this you need to create another
cluster as a test. For more information, see Recreating VMware High Availability Cluster
(1003715).
14. Verify that that UDP 8043 packets used for the HA backbone communications are not dropped
between the ESX hosts. For more information see HA fails to configure after task passes 90%
"Internal AAM Error - agent could not start" (1018217).
15. Ensure that the ESXi host userworld swap option is enabled. For more information see ESXi hosts
without swap enabled cannot be added to a VMware High Availability Cluster (1004177).
Notes:
 If your problem still exists after trying the steps in this article:
 Gather the VMware Support Script Data. For more information, see Collecting
diagnostic information for VMware products (1008524).
 File a support request with VMware Support and note this KB Article ID in the
problem description. For more information, see How to Submit a Support Request.
Keywords
vmware high availability, vmware ha, ha
This Article Replaces
1003691
Update History
05/11/2010 - Added step 15, with link to KB 1004177.
Request a Product Feature
To request a new product feature or to provide feedback on a VMware product, please visit the Request a
Product Feature page.
Feedback
Download