vSphere/vSAN Encryption: Virtual Machine Locked Alarm

Previously I deployed a KMS solution within my VMware Home Lab. Everything was working great until I accidentally powered off my entire Home Lab. When I did this several of my VMs were locked, even though I had powered everything back up, and my KMS was up and running.  This proved to be a very good learning experience since I was able to learn a lot during my resolution of the issue.

You may ask, why did powering off your Home Lab accidentally cause a Virtual Machine to display a Locked Alarm? Simple, because KMS was deployed, and it was doing its job.

To give a little bit of background, my Home Lab consists of a SuperMicro Server, and an Intel NUC. When deploying a KMS solution, you always want to have the KMS setup using HA, and have redundancy by ensuring your  KMS VM is on a separate cluster than your ESXi host that is doing the encryption. That way, if you have a power outage or any other issue that would cause your ESXi host to be unavailable, your KMS VM can boot, and supply the ESXi Host with the needed keys. If not, your ESXi Host will not be able to get it’s keys needed to unlock VMs, and since your KMS VM would be on that actual ESXi Host using encryption, it would not be able to get the needed keys to unlock itself. It’s important to note that the ESXi Host doing the encryption only request keys during certain actions, such as Host reboot, VM reboot, etc.

This, as you could imagine, would be very bad, and lock you out of your own VMs. The purpose of KMS is to protect specified VMs using encryption, and it will do it’s job, even if that means you aren’t able to power up the VM.

Simply put, the KMS VM needs to be online first, before your ESXi host doing the encryption asks for the keys. This is something that is extremely important to keep in mind when designing a solution using VMware Encryption. Always use HA for KMS, always use separate Hardware if possibly, and always use best practices when it comes to redundancy. The important take away here is to always ensure your KMS is up. If you are to have a power outage that takes everything offline, ensure the KMS comes up first,  before booting ESXi hosts. Finally, ensure you have a good backup, including offsite, of the KMS DB!

You will be happy to know my KMS VM was on a separate cluster than my production VMs on the SuperMicro.  After pondering I wondered if my ESXi Host came up before the KMS VM? To be sure I rebooted my ESXi Host doing the actual encryption. However, I was still getting locked out. What could be the issue? I even tried going back to enable host encryption on my ESXi Hosts; however, it didn’t work.

Thankfully the solution was an easy one. Upon reviewing the logs of my KMS VM, I was quickly able to determine the issue. The issue was that my KMS trial key had expired, which caused the KMS to not give out any keys.

The below error message shows “KeyProviderID required property id not set” when attempting to establish trust within vCenter to the KMS.

KMS Error

Capture2

When logging into vCenter, I am greeted with an error that one of my VMs is Locked.

Capture

Additionally, all of my ESXi Host that are set to perform encryption are giving an error “Host Requires Encryption Mode Enabled Alarm”

1

Upon logging into my KMS and reviewing the logs, I determine my trial license key had expired.

2

Once logged in, I was able to stop the AKM.

3

Once stopped, I went to move forward with replacing the key.

4

To upload the correct key, I needed to go to a particular directory.

5

To upload the correct key, I went to the /var/lib/townsend/akm directory on the KMS VM.

6

I then deleted the expired key.

7

Next, I uploaded the correct License.txt key.

8

Once uploaded, I started the AKM Service back up again.

Capture5

Next, I was able to enable host encryption again on my different ESXi Host.

Capture6

Once enabled, all ESXi host retrieved their needed keys successfully.

Capture4

9

All ESXi Host were back in business, and all Lock Alarms gone.