Instant Clones with kitchen-vcenter
Over the last few posts we optimized our kitchen-vcenter setups and are stuck with the usual, long boot times of Windows systems. Surprisingly, VMware introduced a feature which can help us get rid of those. For good.
This solution is pretty high-end, so we have a lot of requirements to fulfil:
Stable support of this feature landed in kitchen-vcenter v2.4.0 and was sponsored by Siemens Gamesa.
The hardest requirement to match is that both the vSphere Hypervisor and the vCenter Server need to be at least version 6.7.0. While Instant Clones were available before, being used of some VMware Horizon features, the API was never publicly documented or stable at that point in time.
We will be including some guest OS commands, so we need privileges
VirtualMachine.GuestOperations.Query. In addition,
VirtualMachine.Inventory.CreateFromExisting will be neccessary.
Also, there is a manual step involved: Starting the template VM and setting it into a so-called “frozen” state. More on that in the next section.
Put to simple terms, Instant Clones are a logical extension of the Linked Clones feature: they do not only include a delta disk, but apply the same principle to memory as well. New contents will be persisted to their delta disk or their unique memory area, contents from the template will be shared between clones.
This principle immediately raises the question of how a stable template VM would look like. For this purpose, VMware included the possibility to freeze VMs in time - effectively stopping their (virtual) processor. The VM will be displayed as running, but cannot be managed in any way except for a hard power-off.
When an Instant Clone is started, this new VM will resume operations at exactly the same point in time. In the end it doesn’t matter, if this is a cleanly booted VM or even one with a running Tomcat server. It’s all the same to VMware at this point.
To freeze a VM, you log into it (or cleverly use
Invoke-VMScript via Powershell) and execute the command
vmtoolsd --cmd "instantclone.freeze".
In reality, there are two workflows for Instant Clones. While the one mentioned above is the preferred (“Frozen VM Workflow”), there is also another one with an operational VM (“Running Source VM Workflow”). In this case, every Instant Clone launched will create additional Delta Layers on the source VM, which will make things more dynamic but also less performance-optimized.
If you want more details about the feature and the base architecture, I can recommend reading Willian Lam’s two-part series on Instant Clones.
While support for Instant Clones was announced for earlier versions of kitchen-vcenter, it was not completely stable. As initially mentioned, there is the need for executing remote scripts via Guest Operations. This is mainly to signal the OS that its networking card has changed and request a new IP from DHCP.
Mainly this is, because the OS does not notice the different MAC address of its network card. To avoid packages sent with the wrong MAC and being discarded by the Hypervisor, some commands need to be run on Windows to account for this. If you want to know the details, look at the kitchen-vcenter implementation
As this is a mix of a new clone type and interaction using the Guest Operations API, we need settings (and privileges) for both:
driver: # ... usual settings go here ... clone_type: "instant" platforms: - name: win2016 template: windows2016-frozenvm driver: vm_os: windows vm_username: "<%= ENV['GUESTOS_USERNAME'] %>" vm_password: "<%= ENV['GUESTOS_PASSWORD'] %>"
Please remember that we need to have a running VM for this to work and it has to be in
frozen state. If you did not set that up correctly, kitchen-vcenter will gently remind you.
The following benchmarks (and the other ones in related posts) were part of my talk on kitchen-vcenter at ChefConf London 2019. I measured everything over 10 operations on my Laptop running a nested vSphere infrastructure.
Linked Clone of a Windows 2016 machine with active IP discovery:
Using instant clones instead:
And there we have it: a fully running, new Windows VM within 18 seconds. Compared to the initial 58 seconds that is a lot faster, isn’t it?
With memory sharing, new Test Kitchen instances need to be on the same vSphere host as the source VM which might cause some congestion if you use the feature excessively. Also, as we are the using Guest Operations API to trigger the OS, this might fail from time to time. If you use the feature in your CI/CD pipelines, I would recommend configuring a low number of retries.
Currently, the Kitchen driver only implements refreshing the IP on Linux via
dhclient and only with Vmnet3 interfaces.
While this marks the end of our journey for more speed, there are still many options to customize your Kitchen instances. More CPU, more Memory, more Disks or even switching over to other networks will be covered in a later post of this series.