My Magical Adventure With cloud-init

Published on , 2943 words, 11 minutes to read

"If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn't. And contrary wise, what is, it wouldn't be. And what it wouldn't be, it would. You see?"

The modern cloud is a magical experience. You take a template, give it some SSH keys and maybe some user-data and then you have a server running somewhere. This is all powered by a tool called cloud-init. cloud-init is the most useful in actual datacenters with proper metadata services, but what if you aren't in a datacenter with a metadata service?

Recently I wanted to test a script a coworker wrote that allows users to automatically install Tailscale on every distro and version Tailscale supports. I wanted to try and avoid having to install each version of every distribution manually, so I started looking for options.

Mara is hacker
<Mara>

This may seem like overkill (and at some level it probably is), however as a side effect of going through this song and dance you can spin up a bunch of VMs pretty easily.

cloud-init has a feature called the NoCloud data source. To use it, you need to write two yaml files, put them into a specially named ISO file and then mount it to the virtual machine. cloud-init will then pick up your configuration data and apply it.

Mara is hmm
<Mara>

Wait...really? What.

Cadey is coffee
<Cadey>

Yes, really.

Let's make an Amazon Linux 2 virtual machine as an example. Amazon offers their Linux distribution for download so you can run it on-premises (I don't really know why you'd want to do this outside of testing stuff on Amazon Linux). In this blog we use KVM, so keep that in mind when you set things up yourself.

First you need to make a meta-data file, this will contain the VM's hostname and the "instance ID" (this makes sense in cloud contexts however you can use whatever you want):

local-hostname: mayhem
instance-id: 31337
Mara is hacker
<Mara>

You can configure networking settings here, but our VM is going to get an address over DHCP so you don't really need to care about that in this case.

Next you need to make a user-data file, this will actually configure your VM:

#cloud-config
#vim:syntax=yaml

cloud_config_modules:
  - runcmd

cloud_final_modules:
  - [users-groups, always]
  - [scripts-user, once-per-instance]

users:
  - name: xe
    groups: [wheel]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
    ssh-authorized-keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPYr9hiLtDHgd6lZDgQMkJzvYeAXmePOrgFaWHAjJvNU cadey@ontos

write_files:
  - path: /etc/cloud/cloud.cfg.d/80_disable_network_after_firstboot.cfg
    content: |
      # Disable network configuration after first boot
      network:
        config: disabled

Please make sure to change the username and swap out the SSH key as needed, unless you want to get locked out of your VM. For more information about what you can do from cloud-init, see the list of modules here.

Now that you have the two yaml files you can make the seed image with this command (Linux):

$ genisoimage -output seed.iso \
    -volid cidata \
    -joliet \
    -rock \
    user-data meta-data

In NixOS you may need to run it inside nix-shell: nix-shell -p cdrkit. If you are using macOS, you need to use this command:

$ hdiutil makehybrid \
    -o seed.iso \
    -hfs \
    -joliet \
    -iso \
    -default-volume-name cidata \
    user-data meta-data

Now you can download the KVM image from that Amazon Linux User Guide page from earlier and then put it somewhere safe. This image will be written into a ZFS zvol. To find out how big the zvol needs to be, you can use qemu-img info:

$ qemu-img info amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2
image: amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2
file format: qcow2
virtual size: 25 GiB (26843545600 bytes)
disk size: 410 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

The virtual disk image is 25 gigabytes, so you can create it with a command like this:

$ sudo zfs create -V 25G rpool/safe/vms/mayhem

Then you use qemu-img convert to copy the image into the zvol:

$ sudo qemu-img convert \
    -O raw \
    amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2 \
    /dev/zvol/rpool/safe/vms/mayhem

If you don't use ZFS you can make a layered disk using qemu-img create:

$ qemu-img create \
    -f qcow2 \
    -o backing_file=amzn2-kvm-2.0.20210427.0-x86_64.xfs.gpt.qcow2 \
    mayhem.qcow2

Open up virt-manager and then create a new virtual machine. Make sure you select "Manual install".

The first step of the "create a new virtual machine" wizard in virt-manager with "manual install" selected

virt-manager will then ask you what OS the virtual machine is running so it can load some known working defaults. It doesn't have an option for Amazon Linux, but it's kinda sorta like CentOS 7, so enter CentOS 7 here.

The second step of the "create a new virtual machine" wizard in virt-manager with "CentOS 7" selected as the OS the virtual machine will be running

The default amount of ram and CPU are fine, but you can choose other options if you have more restrictive hardware requirements.

The third step of the "create a new virtual machine" wizard in virt-manager with 1024 MB of ram and 2 virtual CPU cores selected

Now you need to select the storage path for the VM. virt-manager will helpfully offer to create a new virtual disk for you. You already made the disk with the above steps, so enter in /dev/zvol/rpool/safe/vms/mayhem (or the path to your custom layered qcow2 from the above qemu-img create command) as the disk location.

The fourth step of the "create a new virtual machine" wizard in virt-manager with /dev/zvol/rpool/safe/vms/mayhem selected as the path to the disk

Finally, name the VM and then choose "Customize configuration before install" so you can mount the seed data.

The last step of the "create a new virtual machine" wizard in virt-manager, setting the virtual machine name to "mayhem" and indicating that you want to customize configuration before installation

Click on the "Add Hardware" button in the lower left corner of the configuration window.

Make a new CDROM storage device that points to your seed image:

And then click "Begin Installation". The virtual machine will be created and its graphical console will open. Click on the info tab and then the NIC device. The VM's IP address will be listed:

Now SSH into the VM:

$ ssh xe@192.168.122.122
The authenticity of host '192.168.122.122 (192.168.122.122)' can't be established.
ED25519 key fingerprint is SHA256:TP7dWLkHOixx5tr78qn0yvDQKttH0yWz6IBvbadEqcs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.122' (ED25519) to the list of known hosts.

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
8 package(s) needed for security, out of 17 available
Run "sudo yum update" to apply all updates.
[xe@mayhem ~]$

And voila! A new virtual machine that you can do whatever you want with, just like you would any other server.

Mara is hmm
<Mara>

Do you really need to make an ISO file for this? Can't I just use HTTP like the AWS metadata service?

Cadey is enby
<Cadey>

Yes and no. You can have the configuration loaded over HTTP/S, but without special network configuration you won't be able to have http://169.254.169.254 work like the AWS metadata service without a fair bit of effort. Either way, you are going to have to edit the virtual machine's XML though.

Mara is wat
<Mara>

XML? Why is XML involved?

Cadey is enby
<Cadey>

virt-manager is a frontend to libvirt. libvirt uses XML to describe virtual machines. Here is the XML used to describe the VM you made earlier. This looks like a lot (because frankly it is a lot, computers are complicated), however this is a lot more manageable than the equivalent qemu flags.

Mara is hmm
<Mara>

What do the qemu flags look like?

Cadey is enby
<Cadey>

Like this. It is kind of a mess that I would rather have something made by people smarter than me take care of.

To enable cloud-init to load over HTTP, you are going to have to add the qemu XML namespace to mayhem's configuration. At the top you should see a line that looks like this:

<domain type="kvm">

Replace it with one that looks like this:

<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">

This will allow you to set the cloud-init seed location information using a SMBIOS value. To enable this, add the following to the bottom of your XML file, just before the closing </domain>:

<qemu:commandline>
  <qemu:arg value="-smbios"/>
  <qemu:arg value="type=1,serial=ds=nocloud-net;h=mayhem;s=http://10.77.2.22:8000/mayhem/"/>
</qemu:commandline>

Make sure the data is actually being served on that address. Here's a nix-shell python one-liner HTTP server:

$ nix-shell -p python3 --run 'python -m http.server 8000'

Then you will need to either load the base image back into the zvol or recreate the qcow2 file to reset the VM back to its default state.

Reboot the VM and wait for it to connect to your "metadata server":

192.168.122.122 - - [04/Jun/2021 11:41:10] "GET /mayhem/meta-data HTTP/1.1" 200 -
192.168.122.122 - - [04/Jun/2021 11:41:10] "GET /mayhem/user-data HTTP/1.1" 200 -

Then you can SSH into it like normal:

$ ssh xe@192.168.122.122
The authenticity of host '192.168.122.122 (192.168.122.122)' can't be established.
ED25519 key fingerprint is SHA256:eJRjDsvnVrXfntVtNVN6N+JdakaA+dvGKWWQP5OFkeA.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.122' (ED25519) to the list of known hosts.

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
8 package(s) needed for security, out of 17 available
Run "sudo yum update" to apply all updates.
[xe@mayhem ~]$
Mara is hmm
<Mara>

Can I choose other distros for this?

Cadey is enby
<Cadey>

Yep! Most distributions offer cloud-init enabled images. They may be hard to find, but they do exist. Here's some links that will help you with common distros:

In general, look for images that are compatible with OpenStack. OpenStack uses cloud-init to configure virtual machines and the NoCloud data source you're using ships by default. It usually works out, except for cases like OpenSUSE Leap 15.1. With Leap 15.1 you have to pretend to be OpenStack a bit more for some reason.

Mara is hmm
<Mara>

What if I need to template the userdata file?

Cadey is facepalm
<Cadey>

You really should avoid doing this if possible. Templating yaml is a delicate process fraught with danger. The error conditions in things like Kubernetes are that it does the wrong thing and you need to replace the service. The error condition with this is that you lose access to your server.

Mara is hacker
<Mara>

I'm going to do it anyway. There are Facts and Circumstances™ that make me have to template it.

Cadey is percussive-maintenance

When you are templating yaml, you have to be really careful. It is very easy to incur the wrath of Norway and Ontario on accident with yaml. Here are some rules of thumb (unfortunately gained from experience) to keep in mind:

Something very important is to test the templating on a virtual machine image that you have a back door into. Otherwise you will be locked out. You can generally hack around it by adding init=/bin/sh in your kernel command line and changing your password from there.

When you mess it up you will need to get into the VM somehow and do one of a few things:

  1. Run cloud-init collect-logs to generate a log tarball that you can export to your host machine and dig into from there
  2. Look through the system journal for any errors
  3. Look in /var/log for files that begin with cloud-init and page through them

If all else fails, start googling. If you are running commands against a VM with the runcmd feature of cloud-init, I'd suggest going through the steps on a manually installed virtual machine image at least once so you can be sure the steps work. I have lost 4 hours of time to this. Also keep in mind that in the context that runcmd runs from, there is no standard input hooked up. You will need to pass -y everywhere.

If you want a simple Alpine Linux image to test with, look here for the Alpine Linux images I test with. You can download this image from here in case you trust that I wouldn't put malware in that image and don't want to make your own.


In the future I plan to use cloud-init extensively within my new homelab cluster. I have plans to make a custom VM management service I'm calling waifud. I will write more on that as I have written the software. I currently have a minimum viable prototype of this tool called mkvm that I'm using today without any issues. I also will be writing up how I built the cluster and installed NixOS on all the systems in a future article.

cloud-init is an incredible achievement. It has its warts, but it being used in so many places enables you to make configuring virtual machines so much easier. It even works on Windows!. As much as I complain about it in this post, life would be so much worse without it. It allows me to use the magic of the cloud in my local virtual machines so I can get better use out of my hardware.


Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.

Tags: