Linked Clones with kitchen-vcenter

Thumbnail

Linked Clones with kitchen-vcenter

Quickly starting new Test Kitchen machines is one of the main concerns for getting the desired feedback cycles in cookbook development. While machines get created as a full clone by default, the kitchen-vcenter driver offers a better alternative.

Requirements

The functionality demonstrated in this blog post relies on the [kitchen-vcenter] driver for Test Kitchen in at least version 1.3.1.

Additionally, you need to have a (stopped) virtual machine which is the source of our test instances and the machine needs to have a VMware snapshot already. As with any other virtual machine, it also has the VMware tools installed.

Linked Clones

The functionality of Linked Clones has been available for a long time with VMware vCenter infrastructures. The base idea comes from the need to quickly deploy VDI machines for office workers and the problem of excessive IO operations when you copy a complete drive first.

Instead of copying a template, like with Full Clones, this type of clones just references the snapshot of a disk of a template VM and then adds a Delta Disk, which will be the target of any write operations of the new VM.

This way, there is no extensive copying activity in the background (less IO) and the operation completes much faster.

Configuration

Switching to Linked Clones in your TK setup is very easy. In addition to the requirements detailed before, you only have to set the type of clone explicitly:

driver:
  name: vcenter
  vcenter_disable_ssl_verify: true
  vm_rollback: false

  vcenter_host: vcsa.lab.local
  vcenter_username: "<%= ENV['VCENTER_USERNAME'] %>"
  vcenter_password: "<%= ENV['VCENTER_USERNAME'] %>"
  
  datacenter: "Datacenter"
  clone_type: "linked"

As you can see, we just switched from clone_type “full” to “linked”. That’s all. From now on, your setup will omit the expensive copy operations and use the built-in Linked Clone functionality. If any of the requirements are not met, you will get an error message stating the problem.

Benchmarks

The following benchmarks (and the ones in the following posts) were part of my talk on kitchen-vcenter at ChefConf London 2019. I measured everything at least 10 times on my Laptop (Core i7-8565U, 40GB Ram, Toshiba SSD, Ubuntu 18.04) running a nested vSphere infrastructure (ESX 6.7, vCenter 6.7).

Full Clone of a Windows 2016 machine:

Phase Minimum Maximum Average
Cloning 14.4 16.9 15.4
Booting 22.1 40.2 31.0
Getting IP 28.3 30.4 29.9


Linked Clone of the same template:

Phase Minimum Maximum Average
Cloning 0.9 1.1 0.9
Booting 24.1 36.2 26.3
Getting IP 28.5 30.5 30.2


So we actually removed about 30 seconds in average from our cloning and boot phases. The average time to use for those Windows machines is now at 46 instead of 75 seconds.

Downsides

Using a single snapshot as a source can create IO hotspots if you have many VMs reading from it. As a general principle, Test Kitchen machines only live for a few minues each, though. So those hotspots should not occur in regular use.

In practical use, I observed some unclean removement of VMs by vCenter which once resulted in about 200 orphaned delta disks on the storage systems. The problem could not be reproduced later on, so this could be the result of race conditions within vCenter on high load or even external factors. I would advise to have some sort of monitoring for this condition until the source of the problem is found.

There is more

In the next blog post, we will check why it takes so long to get the IP address reported back and how we can get rid of these awfully long 30 seconds.