Uncategorized – vZilla

Deploying the Veeam Software Appliance on Kubevirt

michaelcade — Tue, 23 Sep 2025 07:28:14 +0000

Virtual Machines on Kubernetes is a thing so I thought it would be a good idea to run through how to get the Veeam Software Appliance up and running on KubeVirt which will also translate to enterprise variants that use KubeVirt to enable virtualization on top of Kubernetes such as Red Hat OpenShift Virtualization and SUSE Harvester.

I wrote about the Veeam Software Appliance and ran through the steps to get the system up and running as the brains of your backup environment along with some of the benefits it brings.

The Veeam Software Appliance – The Linux Experience

For those familiar with the process I took in the above link in vSphere, be warned defining virtual machines in Kubernetes is all about the YAML.

What you will need

A Kubernetes cluster, I will be using Talos Linux as my Kubernetes distribution in the home lab but any bare metal Kubernetes cluster should work providing you have the system requirements.

The system requirements for the nodes in your cluster should have enough to meet the requirements of the Veeam Software Appliance. 8 vCPU, 16+ GB RAM, and two disks (≈240 GB+ each)

You should have a StorageClass with enough capacity to store the above disk requirements.

KubeVirt + Containerized Data Importer (CDI) installed and working. CDI is used to create DataVolumes and import/upload images. (I am running version 1.6.0 of kubevirt)

Client side you will need kubectl and virtctl, kubectl is how we will interact with the cluster, virtctl helps us to upload images, start virtual machines and open a VNC/Console to the machine.

High Level Steps

Upload the Veeam Software Appliance ISO into the cluster (DataVolume)
Create Blank DataVolumes (PVCs) that will become the appliance target disks (this needs to be 2 x 250gb due to us needing 248gb)
Create a Virtual Machine that attaches the ISO and the blank PVCs as disks (ensuring the correct boot order)
A service so that the system can be accessed outside of the cluster.
Start the VM and work through the initial configuration wizard of the appliance

Upload the ISO into the cluster

We now need to get the ISO into our cluster as a DataVolume to be used with the virtual machine creation in the next step. First we need to create a namespace where will create the virtual machine and everything associated with it.


kubectl create ns veeam

You could use port-forward but for a large ISO like this it might take a considerable amount of time. I opted to use a local web server on my system to share the ISO. I used a simple python web server with the following command.


python -m http.server 8080

We will then create a DataVolume for our ISO and we will specify the URL to get our ISO. Be sure to change your path and storageclass.


# veeam-iso-dv.yaml

apiVersion: cdi.kubevirt.io/v1beta1

kind: DataVolume

metadata:

name: veeam-iso-dv

namespace: veeam

spec:

pvc:

accessModes: ["ReadWriteOnce"]

resources:

requests:

storage: 13Gi

storageClassName: ceph-block

source:

http:

url: http://192.168.169.5:8080/VeeamSoftwareAppliance_13.0.0.4967_20250822.iso

We will then apply this with


kubectl apply -f veeam-iso-dv.yaml

The import process will then start and you can use the following command to see the import progress.


kubectl -n veeam get dv veeam-iso-dv -w

You can keep on eye on this, don’t panic if it gets to 99.99% and sits there for the same time again with no updates. You can come out of this watch command and check the pod logs, you will have an importer pod in your veeam namespace that you can run this command against.


kubectl logs importer-prime-a1ec931f-5335-4f59-aa97-6f165f1c38eb -n veeam

When complete you can run the following command and hopefully you will see something like the screenshot below


kubectl -n veeam get dv veeam-iso-dv

Note when I first tried to use port-forward with this iso it was suggesting 7+ hours so this way is much more efficient.

Create blank target disks (DataVolumes)

You might at this stage be realising the difference between Kubevirt and a hypervisor like vSphere…

We have the ISO uploaded to the cluster and we are now ready to create our Virtual Machine.

There is an option in the following YAML to use CloudInitNoCloud to inject an SSH user for access after installation, I have not added this as although you can SSH into the Veeam Software Appliance from a security standpoint it is not enabled and requires elevated security officer approval, maybe more on this in another post.

We are now going to define our Virtual Machine in code. Again if you are copying and pasting then please amend resources, storageclass and namespace if different to what I have used above.

The appliance requires UEFI vs BIOS so we have made that change to the VM configuration below.

You can find the VM code listed here in this GitHub repository
veeam-vm.yaml

https://github.com/MichaelCade/vsa_kubevirt

Now we can create our VM using the command below


kubectl apply -f veeam-vm.yaml

At this stage our virtual machine is powered off but to confirm the status we can check using


kubectl get virtualmachine -n veeam

Start the VM

Four YAML files later, we are now ready to start the machine.

When ready we can start things using the following virtctl command


virtctl start veeam-appliance -n veeam

lets then use that command again to check the status of our virtual machine


kubectl get virtualmachine -n veeam

Now we need a console to complete our initial configuration wizard within the TUI, I have installed tigervnc as my client

you can then run a virtctl command to connect


virtctl vnc veeam-appliance -n veeam

and with this command you will also see the following pop up,

Ok, I think that is enough for this post. I have been storing the code snippets in a repository on GitHub that can be found here. – https://github.com/MichaelCade/vsa_kubevirt

Once you get to this page of the Initial configuration, you can then follow this post to complete the steps. – https://vzilla.co.uk/vzilla-blog/the-veeam-software-appliance-the-linux-experience

The Veeam Software Appliance – The Linux Experience

michaelcade — Mon, 08 Sep 2025 15:18:24 +0000

The first week in September 2025 saw a massive initial release of the Veeam Software Appliance. Since the inception of Veeam and the ability to protect Virtual Machines on VMware vSphere Veeam has been a Windows Server based product, until now.

I have also skipped over how “Virtualisation was just the start” and it was, now the Veeam Data Platform protects workloads and data across many different platforms, VMware vSphere, Microsoft Hyper-V, Proxmox, Oracle Linux Virtualisation and lots more hypervisors, as well as protecting public cloud workloads on AWS, Microsoft Azure and Google Cloud. The protection of Kubernetes came almost five years ago with the acquisition of Kasten now known as Veeam Kasten or Veeam Kasten for Kubernetes. We have M365, Salesforce backup, EntraID, Agents for Windows, Linux, Solaris, AIX and then we also protect unstructured data by way of NAS devices and Object Storage locations.

As of this week the management layer of Veeam is no longer just an option on Windows, Linux entered the room.

Home Lab Setup Plan

My morning of the 3^rd September was sat waiting for the downloadable, public release of the Veeam Software Appliance (VSA) We as Veeam have made big splashes about this at our event earlier in the year and showed off a lot of the features and functionality, I am not going to get into too much of that here.

When the downloads became available it was time to start the engines, but not without a solid plan. I have documented my home lab setup in previous posts. As much as this is a home lab I still want to treat things with some real life accountability. However, what I am about to show you is that my repository will be living on the same virtual environment I am likely going to be protecting… for now and this will be resolved down the line with the Veeam Data Cloud Vault option where I can store those backups offsite in the Veeam first party solution.

OK, back to the plan.

As you can see from the above image we had some virtual machines to create, 10 if we include that DNS server down there. We also have some futures that are documented from Veeam as coming in the not too distant future.

The VSA is available in an OVA and ISO format. There is then a smaller ISO for the Veeam Components such as the Hardened Repository and Proxies. You can see we have 5 proxies planned and a Hardened Repository (again bad practice to store a backup repository on the infrastructure you are going to protect, don’t be like me)

For the VBR (Veeam Backup & Replication) server we will use the OVA and import this into our vSphere environment and create a new VM.

For the EM (Enterprise Manager) server we will use the ISO and create a VM and run through the configuration steps to get this setup.

The proxies and hardened repository will use the smaller ISO

Veeam ONE is a powerhouse with this release, still Windows but massively important. We will provision a Windows 2025 Server for this.

I will mention for this plan, I also provisioned a Windows 2025 Server Core for the DNS server and created a new forest called “vzilla.local”

Hopefully that makes everything in the picture a little clearer, the web client can be any OS and any mainstream browser (I have tried with Chrome, Edge and Firefox) The thick client aspect is used to access some of the Veeam ONE and the VBR server. I have installed the VBR thick client on the Veeam ONE Windows Server.

VBR Deployment Process

As mentioned I am going to be using vSphere here to deploy my virtual machines for this plan, clearly anything can be used including physical systems. Anything that supports the OVA but equally this is why there is an ISO for those physical systems or where OVA is not supported.

Select the OVA

I am not going to go through step by step here, basically select the OVA, give the machine a relevant name for your environment choose the DC, Cluster or Host you wish to deploy to, select some shared storage and finish.

When that import is complete, you will have a powered off virtual machine ready to be powered on. (I created another import to capture the process here and it is named 2025-veeam-vbr1)

Powering on the VM

Next we will see a booting ‘Veeam Backup & Replication” followed by the initial configuration wizard, starting with the license agreements.

Accept this then we can give the box a name.

Next we can configure our network, I have used the static option to set a standard static ip4 but ipv6 is also available to set here.

Then we need to change the time, I am based in the UK so I selected change and searched for London to update the timezone but left the available NTP servers as is.

Then we are onto setting a host administrator, this process is the same for any managed Veeam machine, proxies, repositories and enterprise manager.

The next screen is configuring the multi factor authentication (MFA), I am using the Microsoft Authenticator for all of my MFA needs so I hit show QR code, scan that within the app and then type the number provided from the app to proceed to the next step.

Following that step you are then asked if you would like to create a Security Officer,

At the Security Officer step of the Initial Configuration wizard, configure the default security officer account to perform specific operations in the Host Management console — veeamso. This account type provides an additional security layer to protect your infrastructure against malicious system administration.

The above was taken from the documentation pages – https://helpcenter.veeam.com/docs/vbr/em/deployment_linux_iso_install_security_officer.html?ver=13

if you choose to skip this step, you are prompted with a very red warning. It will not be possible to enable this role later on after this step. For the purposes of this demo walk through I have said OK but in my lab environment I have created that veeamso account on all hosts, when you set a password in this wizard, you will be required to change that when your security officer first logs into the management console to approve tasks.

The final page is a summary of what you have configured

Select finish and then wait for the state configuration to be saved and then for the services to come up. When they are up you will be able to access.

Host Management Console

The Host Management Console allows administrators to perform configuration and maintenance tasks, including managing network settings, server time, host users and roles, backup infrastructure, OS and Enterprise Manager updates, maintenance operations, and security settings.

This can be accessed via https://192.168.169.122:10443/

obviously changing to your set IP address, you will then be able to login here with the veeamadmin account.

Web UI Preview

The Veeam Backup & Replication web UI is a browser-based interface that enables you to manage backup and recovery operations, monitor your backup infrastructure, and configure system settings from any supported device. The web UI provides a modern, streamlined experience designed to simplify daily administration and deliver at-a-glance visibility into your data protection environment.

Please note that this is a preview and not all capabilities within Veeam Data Platform are available here today in this release, you will be able to manage some of your environment from here but you will need the thick client access for all tasks.

This can be accessed via https://192.168.169.122/

again changing your IP address above.

Thick Client Access

For those of you familiar with the thick client approach, it has been enhanced and looks much better, its faster and is used and designed to let you quickly find commands that you need and perform data protection and disaster recovery tasks.

As stated in my diagram above, you will need a Windows based machine to install and use the thick client. Point the client to your address above.

Next Steps

To complete the home lab setup, I went and created 6 further virtual machines using the smaller JeOS ISO and this is what was used to create the hardened Linux repository (bad practice to run this on a VM, the security score in threat center will also warn you against this, don’t be like me) and then the five proxies, the process is very much the same as the initial configuration wizard we went through above and then you add them into your VBR thick client.

With the proxies, I created one proxy per vSphere host, when I added them in you have the choice of what proxies they are going to be.

I then repeated the steps for VMware and VMware CDP as I want these to be the data movers for all tasks.

I had also deployed a Windows Server, downloaded the new Veeam ONE ISO and got things up and running there, maybe another post on those steps to come. Adding Veeam ONE is important as this gives you some great insight into the security posture of your backup environment. The threat center element is pulling data from Veeam ONE to display within the thick client and the web UI.

Next, I wanted to get my Enterprise Manager VM up and running and to do this I used the Veeam Software Appliance ISO and this runs again through a similar configuration wizard as shown but on first boot you will get a choice of Enterprise Manager or Veeam Backup and Replication to install on this, process was to create a VM with the required CPU, Memory and disks and then run through that process.

The final steps was to add some infrastructure to protect and then create some backup jobs. If you are familiar with Veeam this should be the same process as before.

To caveat once again here, this is a lab environment where I can show demos of Veeam Software, I am using virtualization for components that in production should be on physical hardware and not virtual machines, but for home lab environments what I have built will cover elements of what I need to cover in demonstrations. I will also be adding cloud based protection workloads and Veeam Kasten instances later on down the line that I have running to extend the lab into different platforms.

Fixing a bounced vCenter server

michaelcade — Wed, 23 Jul 2025 09:02:16 +0000

A few times a year, my home lab gets bounced for whatever reason and generally because I have the vSphere vCenter living on top of the 5 ESXi nodes this causes an issue where when the hosts come back up we are lacking a fully functional vCenter, probably need to consider a better approach but here we are.

I am able to get into the vCenter Server Management console at https://192.168.169.181:5480 a port etched into my brain for some reason!

But when we head to services we have many that are not running and they should be.

I check the access settings and ensure that SSH is enabled so we can dive deeper and try and get these services up and running.

I first try ssh root@192.168.169.181 but I am met with

So my alternate way is using

ssh -o IdentitiesOnly=yes -o PreferredAuthentications=password -o PubkeyAuthentication=no root@192.168.169.181

Which gets me in to

I then use the service-control –status command to see the same what we have in the management UI

Followed by service-control –start –all which seems to sit here for a very long time but noting this down at this stage to make sure future me doesn’t do anything silly.

While the above was going on I checked the time tab in the management UI and we were back in time, around 2023 and wondered if this could be why things were not quite right and we were also using host based NTP, I changed this setting and got things up to date

In changing the time we were now able to get into

We do have to wait a while for things to finish initializing here. But maybe it was all down to time! I actually got impatient and hit the reboot button from the management UI.

Things do not look all that pretty and I may update if relevant here with what is going on!

The Hypervisor Hunger Games – Service Provider Edition

michaelcade — Sun, 29 Jun 2025 09:52:01 +0000

Many MSPs (Managed service providers) have hedged their platform offering in and around the vSphere ecosystem and now what?

I have said before about the cost conundrum here and these are some decisions that people in all worlds will have to consider. But in a service provider world it’s maybe not a simple rip and replace with Nutanix AHV or another.

Service providers bring values by having this stack that they not only bring a relationship with their customers they also can automate and provide additional wrap around services and join up this vast ecosystem we have when it comes to VMware.

It’s also very much a price per fight here for MSPs. Value add + capabilities so spending the winnings on software licensing probably doesn’t add up. Maybe platform replacements like Nutanix AHV or even Red Hat OpenShift are not that much different licensing cost wise compared to the Broadcom tax. (Maybe it’s a valid tax being the best hypervisor but also the strongest ecosystem)

What I do think we could see is a lot of service providers looking into KVM based options. Albeit the ecosystem is maybe not as polished and supported it might just be enough to ramp up.

I am talking about options like Proxmox, XCP-NG and maybe even the new hypervisor option from HPE but this will come as a premium as well. These options will also not be free, they will be free like a puppy but the cost will come from elsewhere.

The other option could be KubeVirt. KubeVirt is what underpins Red Hat OpenShift Virtualisation but it is an open source project that can be used across many Kubernetes distributions and managed with a bit more effort to OpenShift. Could this be a real option for service providers to accelerate their own offerings into the cloud native ecosystem? An ecosystem that has been built over the last 10+ years.

I am going to share a fantastic resource for the vSphere admin here from my good friend Dean Lewis

Learn KubeVirt: Deep Dive for VMware vSphere Admins

I want to be clear that KubeVirt is even though established and been around a while it’s still missing that polish that we have within mainstream vSphere, Hyper-V, Nutanix AHV hypervisors and platforms but I remember when vSphere was like this and we all flocked in that direction.

All I do know is that wherever service providers land the requirement for data protection and management will be there so regardless.

I wrote about protecting these VMs on Kubernetes here

VMs on Kubernetes protected unofficially by Veeam*

Finally, one thing is for sure. Virtual Machines are not going anywhere! We might be in a world surrounded by AI but the trusty virtualisation era isn’t over and will continue to be a staple be it in the data centre. Or…. In the public cloud…. Could the public cloud IaaS be an option instead of on premises for providers?

My Thoughts on Retrieval-Augmented Generation (RAG) and the Power of Vector Databases

michaelcade — Tue, 13 May 2025 08:48:35 +0000

Some of you may have heard of RAG, retrieval augmented generation?

If you want to use an LLM to answer questions about data it wasn’t trained on, you can use the RAG pattern to supplement it with extra data.

Image Source = https://learnopencv.com/rag-with-llms/

But before we get into RAG, I wanted to touch on Vector Databases a little as they have become popular with the world of AI.

TLDR; A Vector Database is fantastic at cataloging how different pieces of data are related to each other.

What is a Vector?

Vectors are arrays of numbers and when those arrays represent something we call them embeddings. The term vector really just refers to the mathematical concept whereas embedding is kind of like an applied vector if you will. So what do these embeddings represent? Well, technically anything you want, but because it’s very common to use vector databases for natural language processing and semantic search.

Want to learn more about Vector Databases, take on this book! I have not braved it but in the content I have been reading and watching this is mentioned a lot.

Deep Learning: A Visual Approach by Andrew Glassner

Vector databases are just collections of embeddings and these are organised into indexes. An index is kind of like like a table, so a collection of rows of embeddings and we call those records.

RAG

Ok this then brings us back to one of the initial things we said:

If you want to use an LLM to answer questions about data it wasn’t trained on, you can use the RAG pattern to supplement it with extra data.

Let’s say you have a bunch of support docs.

These would get turned into embeddings and stored in a vector database. Then when the user types in a prompt, that prompt gets turned into embedding which is used to search the vector database for similar information.

What you’re doing here is a similarity search. Basically, you’re just looking for the nearest neighbour’s to the embedding that you give the database.

An example

Obviously, I wanted to get hands-on and start playing with some of this stuff in a world of AI but also as a Data Technologist I wanted to see what was possible with some of this data and see how it would handle being hovered above a powerful LLM.

Which then led me down a rabbit hole of how important do these Vector Databases become after your own data is embedded, how much CPU and GPU time and effort does this cost to re embed if something was to go wrong? Anyway that might be another post shortly.

Above we mentioned

Let’s say you have a bunch of support docs.

Now instead of docs lets pretend that we have an amazing community repository called 90DaysOfDevOps full of data and learning information. Kind of similar to support docs! We could probably ask an LLM about 90DaysOfDevOps and get some info back… but its going to be vast and wide and the LLM probably was not trained on this repository.

I am using Ollama with Mistral here… the other model will become clear later.

and if we then ask mistral a question about 90DaysOfDevOps what do we get?

For some this might be the way we have been interacting with LLMs so far, but what if we were able to take that personal data, or data that we want to specifically embed and use against or alongside (not sure terms) with an LLM. We can surely get a more rich response overall?

I have my dataset in the 90DaysOfDevOps repository, locally git cloned to my machine. I then have that mxbai-embed-large model you saw above and a trusty friend of mine in a Postgres Database instance running on a VM but could be anywhere and this has the Pg-Vector extension enabled for Knowledge storage. (Maybe another post, lets see how this one goes first)

I wrote a little app to deal with that embed process which is then in turn the same app which will allow me to interact with that RAG + LLM via a chat / API interface.

https://github.com/MichaelCade/vector-demo

Again maybe we need to go into more detail about this app another time, but for now. We have our Knowledge from our 90DaysOfDevOps repository. Each of these markdown files contains basically a blog about a topic related to the world of DevOps.

We have our Golang code to embed our data.

When the worlds align, and we run our binary against our data that has access to our likely hard coded postgres database instance…. we should start the embedding process into our vector database.

NOTE: if you made it this far and want to see how to spike your GPU… change the code to use mistral for the embedding process, a model that does not know how to embed or has not been trained on that like the embed model. Another rabbit hole I found that there are all sorts of models trained for different scenarios.

Here is what things look like within our super secure vector database, that we leaked connection info and all sorts via GitHub.

Using the same Golang binary we ran we can now interact with that API and chat with the vector plus mistral model.

I wanted to be sure that we were indeed getting something from the vector when we did this so added some additional code to tell me the chunks it was using to respond.

Now our whole app looks like the above embed part but also we added Backend API to the same code base. In the GitHub repository, linked above you will see a vector-demo-ui this is the React Frontend… no shame in saying I used vibe coding for this… who likes frontend stuff anyway.

and to top things off if you don’t want to interact with your AI chat assistant via curl then the frontend almost looks pretty…

Before we wrap things up, we should ask it something specific to the vector embeddings we have provided. First if we ask mistral directly about Day 49 of 90DaysOfDevOps we get:

Then with our RAG + LLM we get:

If you made it this far, I am impressed! We have seen a demise in blog views I think over the last few years so when I jot something down it is mostly for future me, looking for something I have done before, but hopefully this helps spur on someone else to unlock some of their data, and if useful, let me know… Also if you would like to see some content about protecting vector databases, or a deeper dive into the terrible coding I am doing with Golang let me know.

Visualising Veeam: Kubernetes Monitoring with Grafana and ArgoCD

michaelcade — Wed, 09 Apr 2025 11:45:55 +0000

I have been concentrating a lot this year on my home lab, in previous posts I have covered the set up but basically I have a 5 node Talos Kubernetes cluster with rook-ceph as my storage layer and I needed some monitoring for my home lab.

In a VM I am running Veeam Backup & Replication and I wanted to get some hands-on with Grafana, I have more plans but this was project #1

My good friend Jorge has been years into the Grafana dashboards for Veeam. You can find one of the dashboards here.

The Plan:

We are going to use our Kubernetes cluster to host our Grafana instance. Jorge has shared a script that we are going to repurpose into a cronjob, this job will run on a schedule. I think every 5 minutes. This will grab us some details via the Veeam Backup & Replication API and we will have some data visualisation inside of our grafana dashboard.

Deployment: Grafana & InfluxDB

We obviously need Grafana to show our Grafana Dashboard, we will also need InfluxDB which is where the cronjob will store our API data collected from Veeam Backup & Replication. There are many ways to deploy Grafana into your Kubernetes cluster, you could use helm (Kubernetes package manager) but I am going to be using ArgoCD.

I am storing my ArgoCD application here in this GitHub Repository.

This will get you up and running with Grafana. Next you need the IP to access your Grafana instance and the secret to go with the default user ‘admin’

Head over to a browser and get logged in and the first page here you can go and find some more stuff out about Grafana

Select Dashboards, you will notice that I have currently two configured, the one we are focused on is the “Grafana Dashboard for Veeam Backup & Replication” If you have not added this in your configuration you can manually add this as well using the New button in the top right.

and if you have been able to run the cronjob you will have something resembling your Veeam environment

Step Back

Ok all the above is great but I have not really helped you get there yet.

We have used ArgoCD to hopefully deploy Grafana and you will also see a application in there for InfluxDB so lets hope that we have those two up and running. But we need to put some more things in place.

First we will need an influx token and we can get this with the following command.


kubectl get secret -n monitoring influxdb-influxdb2-auth -o jsonpath="{.data.admin-password}" | base64 --decode; echo

Second we need a secret to enable our cronjob to hit our Veeam Backup & Replication server. Obviously add your details there.


kubectl create secret generic veeam-influxdb-sync-secret \
--namespace monitoring \
--from-literal=veeamUsername=administrator \
--from-literal=veeamPassword= \
--from-literal=veeamInfluxDBToken=

Then in the same GitHub Repository you will find a file called ‘veeam-influx-sync.yaml’ this is our cronjob configuration file so we need to apply this into our cluster as well but before we get to that we need to make sure we change some of the environment variables within this file as your environment might be different to mine.


          - name: veeamInfluxDBURL

            value: "http://influxdb-influxdb2.monitoring.svc.cluster.local"

          - name: veeamInfluxDBPort

            value: "80"

          - name: veeamInfluxDBBucket

            value: "veeam"

          - name: veeamInfluxDBOrg

            value: "influxdata"

          - name: veeamBackupServer

            value: "192.168.169.185"

          - name: veeamBackupPort

            value: "9419"

          - name: veeamAPIVersion

            value: "1.2-rev0"

Then deploy that into the cluster


kubectl apply -f veeam-influxdb-sync.yaml

This cronjob will run every 5 minutes but if you wanted to trigger it straight away we can use this command


kubectl create job --from=cronjob/veeam-influxdb-sync veeam-influxdb-sync-manual -n monitoring

You can then check the progress of this process using the following command


POD_NAME=$(kubectl get pods -n monitoring | grep '^veeam-influxdb-sync-manual-' | awk '{print $1}')

kubectl logs -f $POD_NAME -n monitoring

A big thank you to Jorge on this one, if it wasn’t for his hard work in this area then we would not have these dashboards! He has also created some amazing content around this and it is also not just Veeam dashboards, lots of great stuff.

Notes

On the final section of the cronjob script I have filtered to only show the VMware platform if you want to change this back then you can do so by changing the below code you will need to remove


?platformNameFilter=VMware"


veeamVBRURL="https://$veeamBackupServer:$veeamBackupPort/api/v1/backupObjects?platformNameFilter=VMware"

I am working on an update to see if this can be resolved and catch all objects without filtering.

Iteration

If you made it this far… you must be interested! I was not happy with the above situation where I could only display my VMware or one platform when I have several within my environment. I have iterated and now you will find an updated script that loops through the different platforms providing the data to influx and then in turn to Grafana.

Here is that script

And from there you can see that I have my MacOS backups, HyperV backups and Kasten backups all now showing

HomeLab: Trials, Tribulations and Packer

michaelcade — Mon, 24 Feb 2025 14:26:24 +0000

Over the last few weeks I have been lifting, shifting and reshaping some of the home lab and within that process we needed some more templates for both Windows and Linux.

I found an amazing project GitHub Repo – vmware-samples/packer-examples-for-vsphere

And Documentation can be found here

This will give you the ability to quickly get some Linux and Windows templates up and running quickly in your vSphere environment.

My advice from the start is do not use WSL (Windows Subsystem for Linux) but that could be my own user error.

I am using an Ubuntu server in my home lab to perform these tasks and I hit a snag not with the configuration but with some of the dependancies you need to run.

Fixing the “No Module Named ‘winrm'” Error in Packer + Ansible for Windows VM Provisioning

When using Packer with Ansible to provision Windows virtual machines on vSphere, I recently encountered the following error during the Ansible playbook execution:


fatal: [default]: FAILED! => {"msg": "winrm or requests is not installed: No module named 'winrm'"}

This stopped my automated build in its tracks. After some debugging, I found that Ansible’s execution environment was missing the pywinrm module, which is required for managing Windows hosts via WinRM. Here’s how I diagnosed and fixed the issue.

Understanding the Problem

Ansible relies on the pywinrm Python module to communicate with Windows hosts using the WinRM (Windows Remote Management) protocol. If this module isn’t installed in the correct environment, Ansible cannot establish a connection, resulting in the “No module named ‘winrm'” error.

Even though pywinrm might be installed in the system’s Python, Packer’s execution context (often running inside a virtual environment) might not have access to it.

Step-by-Step Solution

Check if pywinrm is Installed in the Correct Python Environment
Since Ansible was running inside a pipx-managed virtual environment, I first verified whether the winrm module was available:


/home/veeam/.local/pipx/venvs/ansible/bin/python -c "import winrm; print(winrm)"

This returned:


ModuleNotFoundError: No module named 'winrm'

That confirmed the issue—pywinrm was missing from the environment Ansible was using.

Install pywinrm in the Correct Environment
Since Ansible was installed via pipx, I needed to install pywinrm inside the same environment rather than globally.

Option 1: Using pipx to Inject pywinrm


pipx inject ansible pywinrm

This ensures that pywinrm is available within Ansible’s execution context.

Option 2: Installing Directly in the Virtual Environment
If you prefer, you can manually install pywinrm inside the virtual environment:


/home/veeam/.local/pipx/venvs/ansible/bin/pip install pywinrm

If pip is missing, install it first:


apt install python3-pip -y

Verify the Fix
To confirm that pywinrm is now correctly installed, run:


/home/veeam/.local/pipx/venvs/ansible/bin/python -c "import winrm; print(winrm)"

If no errors appear, the installation was successful!

Re-run Packer and Ansible
With pywinrm installed, I restarted the Packer build:


packer build -var-file=variables.pkrvars.hcl windows-server.pkr.hcl

This time, Ansible successfully connected to the Windows VM over WinRM, and provisioning completed without issues.

Final Thoughts

This issue highlighted an important lesson about managing dependencies within virtual environments. When working with Packer and Ansible, always ensure that required Python modules are installed inside the environment that Ansible is running in.

By using pipx inject, I was able to keep my environment clean while ensuring Ansible had access to the necessary modules. If you run into similar issues, check:

Where Ansible is installed
Which Python environment it’s using
That required modules like pywinrm are installed in the same environment

Hope this helps anyone facing the same issue!

Building a Resilient Kubernetes Cluster with Talos, Ceph, and Veeam Kasten

michaelcade — Thu, 28 Nov 2024 18:04:10 +0000

More and more clusters have data appearing on them in the Kubernetes world. Either via a StatefulSet, Operator or at least closely tied to a managed database external to the cluster.

But in the cloud native world we have to consider the whole application which includes the data, be it inside or outside of the cluster.

Equally depending on the importance of this data (probably important if you pay for the privilege of having it managed) it’s going to need some care and attention when it comes to data management, protection against accidents, misconfigurations and the ever popular world of cyber threats.

Overview

In this post, yes it’s going to focus on my home lab but it’s also relevant to any Kubernetes cluster far and wide.

We are going to explain the current configuration of how I am running Talos Linux a distribution focused only on Kubernetes but also security. Only using the binaries it absolutely needs to run, making it a very secure and safe option over a full blown Linux distribution.

We will also talk about the Ceph environment and where we store those “mission-critical” data services. Finally we will touch on the ease of protecting these data services with Veeam Kasten for Kubernetes.

Talos

Talos Linux is a lightweight, container-optimised Linux distribution designed specifically for Kubernetes deployments. It is immutable, meaning its filesystem is read-only, enhancing security and consistency. Talos eliminates SSH, relying instead on a secure API for management, making it highly automated and ideal for cloud-native environments. With minimal surface area, it’s designed for maximum reliability and focuses on running Kubernetes efficiently.

Ceph

Ceph is an open-source, highly scalable storage platform that provides object, block, and file storage under a unified system. Rook-Ceph integrates Ceph with Kubernetes, enabling dynamic, cloud-native storage orchestration. Rook-Ceph automates the deployment, management, scaling, and recovery of Ceph clusters within Kubernetes environments. It abstracts complex Ceph operations, making it easier to use as a Kubernetes storage backend for persistent volumes, supporting block storage (RBD), shared filesystems (CephFS), and S3-compatible object storage. This integration ensures resilient, self-managing storage optimized for containerized workloads.

Veeam Kasten

Veeam provides the ability to natively protect your Kubernetes application and data. Natively becoming part of the Kubernetes API through aggregated APIs and Custom Resources, things can be managed via the API or via a UI.

Application consistency and mobility are two other areas where Veeam shines above the open source de facto offerings. The ability to orchestrate the protection of Postgres databases or MongoDB collections vs just grabbing the PVC and hoping for the best. Mobility wise, the ability to help migrate, restore, test your application and data in another Kubernetes cluster by way of transforming the application for the target cluster. This target cluster might be a completely different flavour of Kubernetes. You might be going from a cloud managed version to Talos for example.

Lots more to cover here but let’s get into how to get going. You will also notice that there is no Talos logo listed below, there are also many other distributions not included in the marketing slide… fundamentally though we focus on supporting upstream Kubernetes versions.

Installation

The deployment of Veeam Kasten be done by many marketplace operator hubs. Or via Helm. In this case with Talos we will use Helm to get our deployment of Veeam Kasten up and running.

As mentioned a pre requisite of this would be to have access to your cluster via kubectl and to have helm installed on your machine. You will also require some persistent storage.


helm repo add kasten https://charts.kasten.io/


Helm repo update


helm install k10 kasten/k10 --namespace=kasten-io --create-namespace

You can watch / follow the pod creation with the following command:


Watch kubectl get pods -n Kasten-io

We have used the basic installation, there are many options when it comes to helm chart values, you can find those out in the documentation.


kubectl --namespace kasten-io port-forward service/gateway 8080:80

The Veeam Kasten dashboard will be available at http://127.0.0.1:8080/k10/#/.

We now need a token to authenticate with the default settings on the helm chart. We can obtain this with the following command:

The token’s default expiration is 1 hour (3600 seconds) if no

--duration

flag is specified in the command.


kubectl --namespace kasten-io create token k10-k10

Copy that token and restart the port forward if need be. Paste the token into the web browser and you should see something that resembles the following, minus the amount of apps, policies, and data usage.

Protection

There can be a lot more to the initial deployment and different chart values depending on the cluster you are deploying, authentication options etc etc.

But let’s start with protecting a simple stateful app deployed in our Talos cluster.

First we will need a backup location profile somewhere off cluster to send our backups to if something bad was to happen at the cluster and storage layer.

We can also leverage snapshots for a real fast recovery option but snapshots alone are not going to cut it for a backup. Repeat after me…

Snapshots are not backups.

Something I and many others have been preaching for a very long time in the infrastructure and operations space. Anyway back to it…

Let’s start with showing you some of this important data in our database.

Within the dashboard we should define a location profile

Ok so we have somewhere now to send our backups to, we next need to create a policy to protect our previously installed “mission-critical” application and data.

Create a policy against a pre deployed application.

Now I can’t expect you to trust me at this point that when something bad happens to the app and it’s data this is going to just work. So let’s show you.

Cause a problem in the database

Uh oh! Things don’t seem very well.

Let’s use our backup to restore our application and data back to before the bad thing happened.

Steps to restore

What’s next

There is so much more to cover when it comes to protecting data in and out of Kubernetes but also the great thing about Kubernetes we can also run it anywhere which means we can do this with Talos as well, in my home lab I have a VMware vSphere cluster running my virtual machines, I also have as VMs a second Talos cluster that I would like to cover in the next post. This will highlight some of the additional integration when it comes to storage that we can achieve on this cluster as well as moving workloads between environments with Veeam Kasten.

I also want to get in Virtual Machines on Kubernetes and how these should be protected as well, this will be another follow up post.

We didn’t get into the application consistent side of protecting those databases either and with our mission critical app, everything was fine but if that database has lots of transactions hitting it you can be sure something would get lost in the process.

Extending the Root Partition on RHEL VMs Provisioned via Terraform

michaelcade — Tue, 26 Nov 2024 11:19:38 +0000

When provisioning Red Hat Enterprise Linux (RHEL) servers with Terraform, managing disk space can be tricky, especially when VMs are deployed with additional root disk space. By default, the root partition often matches the size of the template disk, leaving any extra space unallocated. This post documents resolving this issue to ensure your servers fully utilise their allocated disk space.

I would also welcome if there is a way to achieve this through Terraform for ease.

The Problem

We recently deployed three RHEL 9.3 virtual machines (VMs) in a VMware vSphere environment using Terraform. Each VM was provisioned with a 350GB disk, yet the root partition only used 50GB—matching the size of the original template disk. The remaining space was unallocated and, therefore, unusable.

This mismatch occurred because Terraform doesn’t automatically adjust partitioning for larger disk sizes when provisioning from a smaller disk template. To resolve this, we had to manually extend the root partition to utilise the full disk size.

Keep me honest, does Terraform have the ability to do this?

The Solution

let’s take a look at how we can extend the root partition and filesystem on a live RHEL system without service disruption.

Step 1: Check the Disk and Partition Layout

First, examine the current disk layout and usage using these commands:


df -h
lsblk

Output:


Filesystem             Size  Used Avail Use% Mounted on

/dev/mapper/rhel-root   44G  5.3G   39G  13% /



NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS

sda             8:0    0  350G  0 disk

├─sda1          8:1    0  600M  0 part /boot/efi

├─sda2          8:2    0    1G  0 part /boot

├─sda3          8:3    0 48.4G  0 part

│ ├─rhel-root 253:0    0 43.4G  0 lvm  /

│ └─rhel-swap 253:1    0    5G  0 lvm  [SWAP]

Here, the

sda

disk has a total size of 350GB, but only ~50GB is allocated. The goal is to use the remaining space.

Step 2: Create a New Partition

Next, create a new partition to allocate the unused space:

Run
fdisk
to modify the disk:


sudo fdisk /dev/sda

Follow these prompts:

Press
n
to create a new partition.
Accept the default values for the starting and ending sectors.
Press
t
to set the partition type, then enter
8e
for LVM.
Press
w
to write the changes and exit.

Update the kernel’s partition table:


sudo partprobe /dev/sda

Step 3: Extend the Volume Group (VG)

Add the new partition to the existing

rhel

Volume Group:

Initialise the new partition for LVM:


sudo pvcreate /dev/sda4

Add the partition to the Volume Group:


sudo vgextend rhel /dev/sda4

Verify the updated Volume Group:


sudo vgdisplay

Look for the “Free PE / Size” field, which should now reflect the additional space.

Step 4: Extend the Logical Volume (LV) and Filesystem

Now that the Volume Group has more space, extend the Logical Volume and filesystem:

Identify the root Logical Volume:


sudo lvdisplay

The path should look like

/dev/mapper/rhel-root

Extend the Logical Volume and filesystem in a single step:


sudo lvextend -r -l +100%FREE /dev/mapper/rhel-root

-r
: Resizes the filesystem alongside the LV.
+100%FREE
: Allocates all remaining space in the Volume Group.

Verify the updated filesystem:


df -h

The root filesystem (

) should now reflect the total disk size.


Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 350G 5.5G 345G 2% /
/dev/sda1 600M 100M 500M 17% /boot/efi
/dev/sda2 1.0G 200M 800M 20% /boot


lsblk


NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS

sda             8:0    0  350G  0 disk

├─sda1          8:1    0  600M  0 part /boot/efi

├─sda2          8:2    0    1G  0 part /boot

├─sda3          8:3    0   48G  0 part

│ ├─rhel-root 253:0    0  350G  0 lvm  /

│ └─rhel-swap 253:1    0    5G  0 lvm  [SWAP]

└─sda4          8:4    0  301G  0 part

Key Changes:

Disk Size:
- The
  sda
  disk remains 350GB in size, as this represents the physical disk.
Partition
sda4
:
- The new partition (
  sda4
  ) is created and marked as type
  LVM
  . Its size is 301GB, which was the previously unallocated space on the disk.
- This partition has been successfully added to the
  rhel
  Volume Group.
Logical Volume
rhel-root
:
- The Logical Volume
  rhel-root
  now spans the entire disk (350GB), which includes the space from
  sda3
  and the newly added
  sda4
  .
Other Partitions:
- /boot
  and
  /boot/efi
  (
  sda1
  and
  sda2
  ) remain unchanged since they were not modified during the process.
- The
  rhel-swap
  volume (on
  sda3
  ) is also unchanged and remains at 5GB.

This

lsblk

output confirms that the unallocated space has been added to the

rhel-root

Logical Volume and is fully utilized by the root filesystem.

I recorded the process below:

My vZilla Homelab – 2024 Edition

michaelcade — Thu, 21 Nov 2024 04:26:24 +0000

I have had a few posts this year regarding the revival of the home lab. But we have progressed into this rabbit hole even further during the year of 2024.

Getting back into the HomeLab game for 2024

Kubernetes in the HomeLab 2024 Update

State of the union

As we approach the end of 2024 and I am under strict instructions not to spend any more money, the current state of play is:

5 x Talos Kubernetes Nodes
5 x VMware vSphere Nodes
1 x Hyper-V Node
1 x Proxmox Node
2 x NAS devices
1 x 24port switch

In this post I am probably not going to be able to get into the software layer above and beyond the hypervisor or Kubernetes layer. But I will state that the reason for the home lab is to learn and get hands on with the technology I am talking about on a daily basis, as a Field CTO at Veeam Software a lot of that hands-on is with the Veeam portfolio of products.

Hardware

What started in January was a purchase of 3 of these Dell Optiplex ultra small form factor mini PCs, they have the Intel i5-8500T and we upgraded these to 32GB of RAM, this is the case for most of our units as you see the story unfold.

The intention of these 3 nodes was to build a Kubernetes cluster, we all need one of them at home right? I had started to learn about Talos as a secure Kubernetes distribution and I had already played around with this in a virtual vSphere environment and was impressed.

So much so I covered this in the 2024 edition of #90DaysOfDevOps

We then picked up a smart looking managed switch to add to our collection, we already had an existing 2 nodes covered in the opening posts that were running a vSphere environment but they were limited by the CPU they had. They do feature later on. We got things up and running at this stage with a 3 node Talos Kubernetes cluster.

The switch I am using and clearly we knew at this stage that expansion was on the cards… This switch is a 24 port Dell X1026 1GB managed switch. We are using this as a flat network switch today, but if I ever want to face my fear of networking then we may look into this down the line.

3 quickly became 5, and we had our bare metal cluster up and running.

This was alongside the two vSphere hosts that you can just about see on the bottom shelf to the left. We are also rocking some antique NAS devices from NETGEAR, I have had these many a year and they have served me well but they are the bottleneck to everything we do in the lab, if any storage vendor would like to sponsor a blog and supply me with a new unit then let me know. I have detailed the specs of these in the posts above.

I have mentioned that the current vSphere hosts were not really getting the job done, they had only a celeron CPU and something was stopping certain workloads from running, the specific one I remember was MongoDB running on top of a vSphere Tanzu Kubernetes virtual cluster just would not start with that architecture. Fast forward to August 2024 and we made a massive change to the lab.

I went and found over facebook marketplace, ebay and some other second hand/refurb computer sites and picked these up to bolster the virtualisation game in the lab and replace those sub par ESXi hosts. Same spec Dell Optiplex 7060 units, 2 of these new units have 32GB RAM and the other 3 have 16GB. This might be an upgrade path early in 2025. All 5 of the new units are running VMware vSphere ESXi.

The final note on hardware is that the two sub par units both with 16GB RAM still, and with the hypervisor hunger games hotting up in the real world, I needed or wanted a way to have access to Microsoft Hyper-V and Proxmox (we had just released support or at least announced this at VeeamON 2024)

We also do not know at this point the lay of the land for vSphere licences in 2025 and moving forward so we might have to migrate to another hypervisor or purchase lab licenses. More on this down the road. I have covered of these additional mini PCs in another blog. They do ruin the sleek look of the lab but they are functional for now.

Host Software

As mentioned we are running Talos on 5 of these nodes to form a Kubernetes cluster, I actually wrote a post about this yesterday covering the upgrade steps because I had been lazy and neglecting the upgrade process.

Upgrading my Talos Kubernetes Cluster

On the other 5 Dell nodes we are running VMware vSphere v8 Update 3, this then also has the Virtual Centre appliance deployed as a virtual machine within the cluster.

As for the Proxmox host we are running 8.2.2, I have not had much chance to play with this yet, we have the Veeam proxy deployed but I have no running systems here, my plan was to use terraform to deploy some systems here to simulate workloads such as database servers and then protect these using Veeam.

Next up we have the Microsoft Hyper-V host, we are running Windows Server 2022 and similar to Proxmox I have not chance to tinker here. I did toy with the idea of running TrueNAS as a VM here to leverage some of the faster storage on this node than the NETGEAR boxes, maybe another action item for 2025.

Finally we have the NAS devices, these if you had not guessed yet are the big black boxes above the nodes. The larger of the two is where we store our vSphere virtual machines. I also have a replica of my personal OneDrive syncing via the NETGEAR software and we have some common SMB shares here also. The smaller unit is used as a backup target for Veeam. We backup our VMs there and those SMB shares as well as OneDrive. We do then also offload important backups to object storage locations.

Wrap Up

Hopefully this might inspire someone to get into the home lab game, I had left it and was only using the cloud for a number of years, but there is something about having the tin next to you to play with.

Just writing this has given me a few action items for 2025.

Get hands on with Proxmox and Hyper-V
Look to upgrade Hyper-V to Server 2025
Consider the next steps for the vSphere cluster (Hypervisor Hunger Games)
NAS Storage options
Upgrade Memory in 3 vSphere nodes to 32GB

I think that probably covers it for 2024 and we will see where things go for 2025.