An error occurred during configuration of the HA Agent on the host

Friday, January 18, 2008 4:36:42 PM

I just received my new ESX servers at work, yay!  I've been dying to play with some real VMWare technology.  Don't get me wrong, VMWare Workstation and the free VMWare Server are some sweet wares, but nothing compares to 8 cores of bare-metal virtualization with 32 Gigs of RAM (per node).

The first hurdle was licensing.  I was unaware of how the licensing worked for ESX server and was not expecting the license activation process and the need to setup a Virtual Center server to handle the license serving.  Not a big deal, I had a server I was about ready to retire anyways so I just reloaded it with Server 2003 R2 Standard and loaded up my Virtual Center with the licensing server.  Problem solved, ESX servers licensed.

The next hurdle was VMotion.  I could not for the life of me figure out what needed to happen to setup VMotion.  I kept getting errors saying "The VMotion interface is not configured (or is misconfigured) on the source host "ESX01".  A little bit of googling and I discovered that I needed to setup a VMKernel interface and enable it for VMotion.  After playing around with that and setting up an internal 192.168.xxx.xxx network for VMotion I was rocking and rolling.  Migrated a server in 2.25 minutes with 1 dropped ping, pretty frickin' sweet if you ask me.

The final hurdle (for now) was setting up HA and DRS.  Primarily the problem was with HA.  I configured the cluster, added my hosts, but they all said "An error occurred during configuration of the HA Agent on the host" when trying to reconfigure HA.  After even more googling I learned that DNS was the most common culprit.  I verified DNS on all of the hosts, could ping them all by name and IP, short name and FQDN.  What could possibly be the problem?  Well, the problem was simple, and stupid, and in my opinion a bug and not a feature.

ESX is appearantly case sensitive on hostnames.  So, my server labled in the hosts file as CORPESX01.DOM.LOC needed to be labeled as corpesx01.DOM.LOC in order to resolve properly.  A simple nslookup from my Windows machine of one of the IP addresses confirmed this capitalization scheme.  I went through each host file making them look as follows:

10.10.10.1        corpesx01.DOM.LOC        corpesx01
10.10.10.2        corpesx02.DOM.LOC        corpesx02
10.10.10.3        corpesx03.DOM.LOC        corpesx03

Life is now good.  I disabled and reenabled HA on my cluster and voila!, everything reconfigured properly.

Hope this helps!

Comments


Leave Comment

  

  

  




Are you human? Prove it!