Stuck / Hung VM Shutdown Workaround

Mindwatering Incorporated

Author: Tripp W Black

Created: 07/23/2009 at 02:43 AM

 

Category:
VMWare
VM Configuration

Issue:
VMs with Reset or Power Off occassionally don't finish. The console screen goes black, but they never show done on the ESXi events.


Workaround 1:
a. On the host or via remote SSH (if enabled), login to the console, and use the vim-cmd to retrieve the list of VMs and their VM IDs.
[root@vmhost33:~] vim-cmd vmsvc/getallvms
<view results and locate the VM stuck and note its VM ID>
VMid Name
...
123 StuckVM
...

b. Issue the Power Off command:
[root@vmhost33:~] vim-cmd vmsvc/power.off 123
Powering off VM:
Power off failed: (vim.fault.InvalidPowerState) {
faultCause = (vmodl.MethodFault) null,
faultMessage = <unset>,
requestedState = "poweredOn",
existingState = "poweredOff"
msg = "The attempted operation cannot be performed in the current state (Powered off)."
}

c. Return to the ESXi web console or the vSphere client
- Check if the VM shows powered off.
- - If so, click Power On the VM.
- - If not, try the next option

d. It is possible the problem is due to multiple VMs have the same process number.
Review the output of step a, again. In our case we had 2 VMs with the ID: 133.

- Confirm the problem with the Power Off is that it cannot determine which VM is 133.
[root@vmhost33:~] vim-cmd vmsvc/get.tasklist 133
(ManagedObjectReference) [
'vim.Task:haTask-88-vim.VirtualMachine.powerOn-1235432'
]

Note:
If you get a message like this, proceed to step e below.
If you get a message like below, check if you wrote the wrong number down.

(vim.fault.NotFound) {
faultCause = (vmodl.MethodFault) null,
faultMessage = <unset>
msg = "Unable to find a VM corresponding to "133""
}
- If you have more than one VM with the same ID, this should not happen.
- Put the host in maintenance mode, and reboot the host. The host may not go into maintenance mode because of this, as well.

e. Check the current status of the stuck task:
[root@vmhost33:~] vim-cmd vimsvc/task_info "haTask-88-vim.VirtualMachine.powerOn-1235432"
...
state = "running",
cancelled = false,
cancelable = false,
...

Note:
In this case it is not cancellable; otherwise, cancel:
[root@vmhost33:~] vim-cmd vimsvc/task_cancel "haTask-88-vim.VirtualMachine.powerOn-1235432"


Workaround 3:
a. On the host or via remote SSH (if enabled), login to the console, and use the ps process command to retrieve the list of VMs and their VM IDs.
[root@vmhost33:~] ps | grep vmx
...
StuckVM
World ID: 10120233
...

Note:
It is possible that the VM won't show up in this view if its process is already "killed" and the VM really is Powered Off.

b. If the VM displays its WorldID, issue the kill command using the VM's WorldID:
[root@vmhost33:~] esxcli vm process kill -t soft -w 10120233
or
[root@vmhost33:~] esxcli vm process kill -t hard -w 10120233
or
[root@vmhost33:~] esxcli vm process kill -t force -w 10120233


Workaround 4:
a. On the host or via remote SSH (if enabled), login to the console, and use the esxcli vm process to retrieve the list of VMs and their VM IDs.
[root@vmhost33:~] ps | grep vmx
...
1234561 1234567 vmx-mks:StuckVM
1234560 1234567 vmx-svga:StuckVM
...

Note: With the grep, the labels are missing. The first column is WID, the second column is CID, and the third column in WorldName. We want to kill the "parent" process which is the column where the numbers are the same - the second column.

b. If the VM is listed, issue the kill command:
[root@vmhost33:~]


Give Up 1:
If you find that multiple VMs have the same vm-ID, or you cannot stop the VM still:
a. Put the server in maintenance mode. It will fail, but this way when it is rebooted it will be in maintenance mode.

b. Manually migrate (vMotion) any VMs that will move.

c. Remote to the ESXi host Lights-Out and choose the <F12> option. Enter the ID and password for the host, in the dialog click the checkbox to force VMs to shutdown. Click <F11> to reboot.



---


Obsolete Old ESXI (Historical Reference Only):
Having upgrade our ESXi servers from 3.5 Update x to vSphere 4.0, we are seeing some weird "bugs". The worst are VMs that hang when shutting down. They seem to do in with ESX, too.
They are shut down, but stuck at a new background in the console that says the console is trying to connect to the VM.

Workarounds:

If you have the CLI installed on a workstation or have downloaded the VM appliance . . . (We've done both.)
You can try the vmware-cmd command:
(run from the bin folder)

Get the running state:
# vmware-cmd.pl -H myserver.mindwatering.local /vmfs/volumes/datastore1/vmfolder/vmname.vmx getstate

Shut it down hard:
# vmware-cmd.pl -H myserver.mindwatering.local /vmfs/volumes/datastore1/vmfolder/vmname.vmx stop hard

Confirm it worked:
# vmware-cmd.pl -H myserver.mindwatering.local /vmfs/volumes/datastore1/vmfolder/vmname.vmx getstate

If you have access to the host console (on the rack), you can use vm-support.
(use the Alt-F1 "back door" with the "unsupported" phrase to get into the server's console. Type Exit and Alt-F2 to get back to the regular yellow and black screen.)

Get list of VMs on host:
# vm-support -x

Look at each VM's unique VMID (Virtual Machine ID). Using the ID, now kill it:
# vm-support -X <VMID> (replace with your VM's id)

Confirm you killed it by getting the list again:
# vm-support -x

Note: this method also logs debug information (a .tgz file) to the /var/tmp directory which points to swap partition. Use it or kill the file.



previous page

×