After a few days, one or more of our vCLS appliances will start showing 100% CPU. Since they are managed by the cluster, you cannot reboot them.
Since we have a 3 ESXi node vSphere environment, we have 3 of these vCLS appliances for the Cluster.
If we ignore the issue, that ESXi host slows down on its responsiveness to tasks. vMotion will start failing (which makes sense), but even the ability to shutdown and restart VMs disappears.
Interestingly, the ESXi GUI on both the host and in vCenter are both quick and responsive, and all services (then running) are still running. Only VM tasks start failing on that host.
1. Log into the appliance vCenter Server Management:
2. Choose Services on the left menu.
3. On the right. click/enable the radio button next to VMware ESX Agent Manager, and click the STOP action at the top.
Leave this tab open.
4. Back in the regular vCenter Client, open the VMs and Templates tab, open the folder vCLS, right click the vCLS(n), and select Delete from Disk.
5. Back in the vCenter Server Management and Services tab, for the still selected VMware ESX Agent Manager service, click START to start it back up.
6. Back in the vCenter Client, still in the vCLS folder, the replacement vCLS(n+1) will be auto-recreated typically within a few seconds.
In addition, if you put a host in maintenance mode, its vCLS will not always allow itself to be vMotion'd. It's stuck on the ESXi host to be updated. Stop the VMware ESX Agent Manger, and delete the vCLS, and the ESXi host will go into maintenance mode. Restart the VMware ESX Agent Manager, and it will be created on another host, if there are more than 2 hosts still online.