Guest Operations and kitchen-vcenter

Thumbnail

Guest Operations and kitchen-vcenter

In this part of the blog series, we will look on how to speed up IP discovery of new machines with a little-known feature of the VMware Tools.

Requirements

The functionality demonstrated in this blog post relies on the [kitchen-vcenter] driver for Test Kitchen in at least version 2.4.0. VMware vSphere has supported the API used in this feature for a long time (pre-4.0), but I only tested the functionality on versions 6.5 and 6.7.

For invoking commands via Guest Operations, the vCenter user of Test Kitchen will need the VirtualMachine.GuestOperations.Execute and VirtualMachine.GuestOperations.Query permissions.

IP Discovery

As I showed in the previous [blog post on Linked Clones](), discovering the IP of a new VM needs about 30 seconds. While that might not sound like a long time for you, imagine doing 40 cycles of testing on a workday and 9 more colleagues having the same situation. That would make up for more than three hours of waiting per day…

To understand the problem, we need to understand that the IP discovery works via the VMware Tools installed on the VM. They get invoked at boot and communicate back to the vSphere hypervisor. In a first phase, this will include information about the Operating System and state of the Tools, including their version. It is only later, that additional information like the IP is polled and transmitted back. While polling DHCP for a new IP admittedly plays a role as well, over my benchmarks I came up at almost exactly 30 seconds under any circumstances.

As the source code of the open-vm-tools is vast and I am no C expert, I asked the nice guys at VMware about this and they confirmed that there is a 30 second poll interval hard-coded within the tools. So apparently getting the needed information quickly is not possible.

Or is it?

Guest Operations API

Known mostly to VMware administrators, there has been a way to invoke commands remotely via VMware Tools for years. It is highly used for automated provisioning via Powershell, for example with the Invoke-VMScript Cmdlet.

In fact, the Guest Operations API documentation shows a lot of functionality, including remote file management, process monitoring/control and even Windows Registry access. All of that, of course, only with valid Guest OS credentials.

With this info at hand, it is possible to remotely issue an OS-specific command to return the assigned IP address and retrieve that output to speed our discovery process up.

Configuration

I dubbed this feature “Active IP Discovery” (my initial name “Aggressive Mode” sounded too negative). While the code is a bit less straightforward due to asynchronity of remote calls, using it is quite simple:

driver:
  # ... usual configuration goes here ...
  active_discovery: true

platforms:
 - name: win2016
   driver:
     vm_os: windows
     vm_username: "<%= ENV['GUESTOS_USERNAME'] %>"
     vm_password: "<%= ENV['GUESTOS_PASSWORD'] %>"

Please keep in mind to never hardcode credentials in your files, to prevent those being pushed into some GIT repository.

If you do not specify the vm_os parameter, the value reported in the first phase of VMware Tools is used. This does not have any performance impact, so you can either rely on it or just make it explicit.

Two discovery commands are currently hardcoded which should cover a wide range of systems:

  • Linux default: ip address show scope global | grep global | cut -b10- | cut -d/ -f1
  • Windows default: sleep 5 & ipconfig

If you want your own command to be executed instead, you can use the active_discovery_command property. The first IP in the output will be passed back as the IP of the virtual machine. Windows will always use CMD instead of Powershell to reduce loading times. I tried using Powershell before, but loading it is awfully slow as remote commands are headless and automatically run on a very low priority in Windows.

As execution on guest VMs might fail due to race conditions or errors during execution, this will make kitchen-vcenter fall back to the standard discovery method via VMware Tools.

Benchmarks

The following benchmarks (and the other ones in related posts) were part of my talk on kitchen-vcenter at ChefConf London 2019. I measured everything over 10 operations on my Laptop running a nested vSphere infrastructure.

Linked Clone of a Windows 2016 machine with standard discovery:

Phase Minimum Maximum Average
Cloning 1.0 1.5 1.1
Booting 24.1 36.2 26.3
Getting IP 28.5 30.5 30.2


Using active IP discovery on the same machine:

Phase Minimum Maximum Average
Cloning 0.8 1.3 1.0
Booting 24.1 46.3 35.0
Getting IP 3.0 22.4 11.2


In some cases, I have seen minimum times below 2 seconds but the variance is pretty high on Windows (presumably due to background processes). Linux has a much more consistent result with 1.5 seconds on average. On Windows, though we are down from about 58 seconds to 47 until we can log into it.

Downsides

There are only a few downsides to this approach. Mainly, we are risking to query the IP before it has been assigned or have a higher variance in response times. This won’t make Active Discovery slower than the usual method, though.

In some environments, the Guest Operations calls are not permitted due to security reasons - it is effectively an administrative backdoor after all. But that is more of a social/political problem than e technological one

Non-Network Communication

During implementation, I was confused on how the VM communicates with the vSphere Hypervisor as there is no IP assigned when VMware Tools start. Digging through the Linux tools source code cleared that up.

VMware has different channels to communicate with guest VMs. Examples include virtual IO channels mapped into and special CPU commands executed on its guest VMs to establish a network-independant communications channel. An example for this can be seen in its backdoor.c file.

There is (even) more

In the next blog post, we will see how we can avoid booting Windows before working on a new clone. Yes, that is actually possible! Check back next week to see how.