Creating custom VM Images for Azure DevOps Scale Set Agents
Creating custom VM Images for Azure DevOps Scale Set Agents
So the thing is, when working with Azure DevOps, you probably want a larger variety of build agents for your pipeline than came out of the box. There are a couple of flavors that you can choose from (Agents hosted by Microsoft, self-hosted agents). The advantage of agents hosted by Microsoft is that they are pretty up-to-date and you don’t have to worry about them. However, they come with a price. As an alternative, you can host the agents yourself. That can turn out to be a lot cheaper, but now you have to worry about the agents. Until now…
TL;DR This blog shows how to create a VM Image with custom-installed tools that you can later use to spin up build agents. This is ideal to prevent you from having to install the same tools over and over again from within your running pipeline and therefore reduces the amount of time your pipeline runs.
Scale set agents
There is a resource in Azure called a Virtual Machine Scale Set (VMSS). This resource allows you to create a virtual machine configuration and depending on the demand will scale the number of virtual machines automatically. Azure DevOps can hook into this scale set agent and can take over the control of scaling. But first, you need this scale set agent.
To create one, you need to provide a Virtual Machine image and some information about the number of resources, networking, and some other stuff. Luckily, Microsoft Azure has a bunch of those images available for you, so you should be good to go in no time.
Creating an agent pool
Once the VMSS is created, you can go to the Azure DevOps settings page (either for the entire organization, or a certain project) and create a new Agent Pool. Select the option for a Scale Set Agent Pool, connect to your Azure account and point it to the VMSS you just created.
When all is fine, it will take up to a minute or 15 for everything to complete. After that, you can assign jobs to the Agent Pool and agents will start spinning up on demand.
Scaling up and down
When you created the Agent Pool in Azure DevOps, you saw the options for scaling. You can configure how many agents should be ‘alive’ when all is idle, the maximum number of agents to scale up to, and how much time to wait before agents are scaled back in again. Because now Azure DevOps takes control of the scaling (and allows you to scale back to 0), the scale set agent can be very cost-effective. Keep in mind though, that it takes some time for an agent to spin up. So scaling down all the way back to 0 will cause the first job queued for the agent pool to experience some delay because the pool will first need to spin up some fresh VM’s for you.
So what is your problem?
Well, basically… When you pick one of the VM images that come with Microsoft Azure, you will almost always spin up a fresh OS with nothing installed. Most of my projects are dotnet (leave out the dot) projects and for them to build, you need the dotnet SDK. I use the Azure CLI a lot in my pipelines. You may need Node JS, docker, or Python for example. This doesn’t necessarily have to be a problem, you can install all that on the agent from within your pipeline. However, some of these frameworks and tools do take some time to install and will therefore delay your pipeline (dramatically). Also, when you need to pay for compute (in the case of hosted agents), such install jobs cost money. To solve this problem, you may want to change the image that VMSS uses to pin up VM’s to have the tools you desire already installed.
Building a custom image
I am not very experienced with VM’s in the cloud, because cloud-native services are very often a better fit for me. So I was somewhat challenged to create a new VM image, but it was easier than I thought. To create such an image, you simply need to provision a VM, make desired changes to that VM, remove all user traces (de-provision) so it can be used for fresh installs and then create an image out of it.
Let’s get to work
As said, most of my projects are dotnet, and I use the Azure CLI a lot. So let’s create an image that contains dotnet SDK’s (6 for LTS and 7 because it is the latest stable version at the time this blog is written). And add the latest version of the Azure CLI. Because Linux seems to be a bit faster when building, the source image will be a Linux image (Ubuntu LTS).
To follow along, you need to have a valid Azure Subscription and enough permissions to create, alter and remove resources. Also, you need to have the Azure CLI installed.
Open a new terminal and type az login
to log in to your Azure account. If you have access to more subscriptions, make sure that you use the az account set --subscription {subscription-id}
command to select the subscription you want to work with. Then create a new resource group with az group create --name {resource-group-name} --location {azure-region}
. Once this is in place, enter the following set of commands:
az vm create --resource-group <myResourceGroup> --name <MyVM> --image UbuntuLTS --os-disk-size-gb 1024 --use-unmanaged-disk --admin-username <myUserName> --admin-password <myPassword> --storage-account <myVhdStorageAccount>
az vm stop --resource-group <myResourceGroup> --name <MyVM>
az vm deallocate --resource-group <myResourceGroup> --name <MyVM>
az vm convert --resource-group <myResourceGroup> --name <MyVM>
az vm start --resource-group <myResourceGroup> --name <MyVM>
Now, this bunch of commands first create a new VM. Once this VM is in place and ready for use, it is stopped, deallocated, and then converted. This means that the storage account used to store the virtual hard disk will now change to a hard disk resource. If you now review the resource group in the Azure Portal, you will see a hard disk resource. Once everything is done, the VM is started again.
Install some tools
Next thing is to install some tools in the VM. As said, I’m going to install dotnet 6 and 7 SDK’s and the Azure CLI. So now SSH into your machine with the username and password entered when creating your VM and type the following command to install the Azure CLI:
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
The dotnet framework is s little bit more complicated. This is because at the time I wrote this blog, dotnet 7 is not in the official package feed of Ubuntu, so you need to switch package feed:
wget https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
rm packages-microsoft-prod.deb
Notice that I switched to a package feed for Ubuntu 18.04. Again, this is because, at the time I wrote this article, 18.04 is the latest LTS version for Ubuntu. When a newer LTS version comes, you may need to change the package feed URL.
So now the new feed is ready to rumble, let’s go and install the dotnet SDK’s
sudo apt-get update
sudo apt-get install -y dotnet-sdk-6.0
sudo apt-get install -y dotnet-sdk-7.0
There you go! Everything we need is installed. If you type az --version
you will see the version of Azure CLI installed, and with dotnet --list-sdks
you will see that both the dotnet 6 and dotnet 7 SDK’s are installed. Let’s reboot the system just to make sure:
az vm stop --resource-group <myResourceGroup> --name <MyVM>
az vm start --resource-group <myResourceGroup> --name <MyVM>
Generalizing
When generalizing (de-provisioning), you basically remove all user-related information making the VM appear as a fresh install. Executing this command is essential before you create the image because, for new VM’s, the image will now really appear as a fresh install. Again, log in to the VM with SSH and execute:
sudo waagent -deprovision+user -force
It is important to wait for the process to fully complete successfully. It usually takes a while and according to the documentation, the recommendation is to wait 1 hour for the command to complete.
Creating a fresh image
Now it is time to actually create the image. The following set of commands create a new VM image:
az vm deallocate --resource-group <myResourceGroup> --name <MyVM>
az vm generalize --resource-group <myResourceGroup> --name <MyVM>
az image create --resource-group <myResourceGroup> --name <MyImage> --source <MyVM>
And now you’re done. If you go to the portal and review your resource group, you will see a resource of type Microsoft.Compute/Images
. This is the image you just created. You can remove all the other resources and now use the freshly created image to create a Virtual Machine Scale Set.
Taking it to the next step
So now you know how to create a custom VM image and how to use that image to spin up VM’s through Azure DevOps in order to scale agents depending on the demand of pipeline jobs.- Now the next step is to automate all this. I have created an image using the procedure above and named that pipeline-agent-latest
. I moved that image to a different resource group, where I automatically deploy a Virtual Network with a VMSS instance that integrates into that VNet. The VMSS has the pipeline-agent-latest
images as a source image to provision VM’s from.
Adding a pipeline
Then I created a new pipeline that does the entire procedure described in this blog to create a new image, but then instead of creating one image, I create two. All in a new resource group. So this new resource group contains a VM and the two VM images pipeline-agent-latest
and pipeline-agent-{version-number}
. Then I move both images to the resource group that contains my VMSS and finally remove the resource group with the VM again. So this resource group only exists for the time it takes to create the images and is then removed again.
The nifty trick here is to remove all VM images from the resource group that contains your VMSS just prior to moving the new images over. In the end, the VMSS resource group will contain two images. One pipeline-agent-latest
, but this is now an updated image, and one pipeline-agent-{version-number}
. The latest is mandatory because I used Azure Bicep to provision the Virtual Network and VMSS and this code requires an image to be available. By also updating the latest
image to the latest version I make sure that even when the VMSS pipeline runs and deploys a new version of my infrastructure, it takes the latest image.
I do think that it is a good idea to use images with a version number just so you can see what’s going on and in the VMSS instance identify which image is actually used. So the final step of the image builder pipeline (let’s call it that) is to update the VMSS instance and make the newly created versioned image the new source for it to spin up VM’s.
az vmss update --resource-group VMSS-Resource-Group --name VMSS-Instance-Name --set virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/{subscription-id}/resourceGroups/VMSS-Resource-Group/providers/Microsoft.Compute/images/build-agent-{version-number}
Now that’s quite a command, but mainly because it contains the complete Resource ID of the VM Image. I scheduled this pipeline to run weekly so that once a week the images of my build agents are updated to the latest version of the dotnet 6 & 7 SDK’s and the latest version of the Azure CLI. Again, you can extend the capabilities of this pipeline to add more pre-installed systems to your image, but I believe you get the point.