vZilla https://vzilla.co.uk One Step into Kubernetes and Cloud Native at a time, not forgetting the world before Thu, 28 May 2026 13:46:17 +0000 en-GB hourly 1 https://wordpress.org/?v=6.9.4 https://vzilla.co.uk/wp-content/uploads/2018/01/cropped-profile_picture_symbol-32x32.png vZilla https://vzilla.co.uk 32 32 Learning The Data Engineering Foundations https://vzilla.co.uk/vzilla-blog/learning-the-data-engineering-foundations https://vzilla.co.uk/vzilla-blog/learning-the-data-engineering-foundations#respond Thu, 28 May 2026 13:46:13 +0000 https://vzilla.co.uk/?p=3625 Yesterday we wrote down a plan.

Today we started the plan.

We started this course linked above, it had a bit of everything I was looking for the foundations. We are around 2 hours in and I decided that might be a good time to look back over the notes and look at the additional areas that have been mentioned already that we need to get into. I am not going to reinvent the wheel here, If you are interested then get into this first. I would say you can take the first hour and learn something.

The next hour was more talking about technologies that would be needed with some overview, In the above hour you will hear many tooling options as well so I will start there. My plan here is to try and find some useful, focused content that I can watch through to learn more of the fundamentals of these tools.

PSA: The Apache Software Foundation (ASF)

Many of the tools listed below are fronted with “Apache”, I had obviously heard of many of these Apache titled bits of software but I was maybe a little naive to the background. The Apache Software Foundation (ASF) actually manages over 350 open-source projects spanning almost every corner of enterprise technology, including web servers, operating systems, development tools, and database management systems.

It All Started with a Web Server, I am pretty confident that anyone that has read this far has used the Apache Web Server before?

The Broader Apache Universe

To give you an idea of how diverse the community is, here are some major non-data engineering projects you might run into:

  • Web & Application Servers: Along with the original HTTP Server, they manage Apache Tomcat, which is used globally to run Java-based web applications.
  • Databases & Storage: They host highly popular NoSQL databases like Apache Cassandra (originally built by Facebook to power their inbox search) and Apache CouchDB.
  • Build & Dev Tools: Tools that developers use every day to build their code, like Apache Maven and Apache Ant, are standard parts of the software development lifecycle.
  • Operating Systems: They even house Apache CloudStack, which is open-source software designed to deploy and manage large networks of virtual machines as a cloud computing platform.

All of the above is out of scope for this learning journey, but in the reverse it reminded me of the Cloud Native Computing Foundation (CNCF) in a way.

Lets get into the Apache Data Engineering Stack and surrounding technologies.

Ingestion/Streaming (Apache Kafka)

The highway that captures live data and moves it into the architecture.

image 4

Storage (Apache Iceberg, Delta Lake inside Cloud Storage (S3))

The smart organising layer that formats the data safely on disk.

image 6

Compute/Processing (Apache Spark, Apache Flink)

The heavy-duty engines that reach into storage to transform and clean the data.

image 7

Architectural Platforms (Snowflake, DataBricks, Microsoft Fabric)

The overarching cloud environments where all of this compute and storage lives.

image 8

Orchestration (Apache Airflow)

The conductor that schedules when the ingestion, storage, and compute actions trigger.

image 9

Languages/Interfaces (Python, SQL, Jupyter).

The tools you use to write the logic and talk to every single layer above.

image 10

Other Tools (Containers, Virtualisation, Cloud)

Docker was mentioned but I have covered this in the #90DaysOfDevOps series, so going to drop that link below.

https://github.com/MichaelCade/90DaysOfDevOps/blob/main/2022/Days/day42.md

image 3

My goal now instead of using Gemini to craft these somewhat interesting images above, is to find where people much smarter than me have already digested a lot of these tools and areas so I can then be smarter myself when I go back to the crash course.
Also side note… that initial crash course is not the full course free on youtube, its an almost 4 hour teaser to a 17 hour paid for bootcamp.. Jury is out if thats the way to go.

]]>
https://vzilla.co.uk/vzilla-blog/learning-the-data-engineering-foundations/feed 0
2026 – Thoughts, Goals & Focus https://vzilla.co.uk/vzilla-blog/2026-thoughts-goals-focus https://vzilla.co.uk/vzilla-blog/2026-thoughts-goals-focus#respond Wed, 27 May 2026 14:52:58 +0000 https://vzilla.co.uk/?p=3608 Hey, it’s been a while…. This is me writing though not another AI dumped bit of content, equally this is for me more than anyone else, its a line in the sand, a commitment of starting a new learning journey and putting it out there for everyone to see. (I did however use AI on the images)

For those that know me, I have been a constant and consistent learner. I started my IT career with hardware—building servers for Cambridge University, getting server operating systems up and running, and handling systems administration and operations. That was before the virtualization era. From there, I moved to the cloud, transitioned my learning into DevOps and Kubernetes, and that brings me to the here and now.

We are surrounded by AI, but “AI” means so many different things to different people, teams, and companies.

Lately, I have been intrigued by a specific challenge: How can we ensure the validity of our AI experience when it interacts with our own data? How can we ensure the data we hover above an LLM is actually going to provide us with the correct information?

From that perspective, I have found myself drawn into the world of Data Engineering and DataOps.

image 1

So, what’s the plan?

The first area of focus is watching foundational content designed to establish core concepts before diving directly into specific platforms. I’m currently working my way through this crash course: Data Engineering Foundations Crash Course!

I am specifically looking at recent content that discusses data engineering core principles and how they land with AI and machine learning in mind. This video is proving to be a great starting point because it details how data flows through real-world systems—exploring how data is stored, processed, orchestrated, and served. It bridges the gap between raw data and consumption, illustrating how foundational steps like proper data collection and pipeline stability directly impact the success of downstream machine learning models and AI dashboards.

Before I step into heavy-hitting enterprise tooling like Databricks, Snowflake, and Microsoft Fabric, this stage is entirely about cementing those basic Data Engineering Foundations through a structured, multi-phase approach:

Phase 1: The Storage Revolution (From Blocks to Formats)

Typically, my learning has gone straight into the platform layer, but we need to focus on the data layer for this new world. In Data Engineering, the bucket alone is just the graveyard; the magic is in the open table formats sitting inside them. To master this, you must understand Delta Lake, Apache Iceberg, and Apache Hudi. These formats add an ACID compliance and metadata layer to Parquet files, effectively making object storage act like a high-performance database.

Phase 2: The Data Pipeline Lifecycle (CI/CD for the DevOps Crew)

Data Engineering has its own version of a DevOps pipeline, and you need to map ETL/ELT (Extract, Load, Transform) to concepts you already know. Instead of application deployment, the focus here shifts to data ingestion, code-driven data transformation (using tools like dbt), and modern workflow orchestration (like Airflow or Dagster).

Phase 3: Data Preparation for LLMs (The RAG & Semantic Layer)

Before an LLM can interact with your data safely and accurately, that data has to be translated into a mathematical language the model understands. This is where traditional data engineering pipelines pivot into AI-native architectures. The focus here shifts from flat relational tables to managing unstructured data (PDFs, documentation, customer tickets).

I need to understand the lifecycle of chunking strategies (how you slice up long text), generating Vector Embeddings via embedding models, and optimizing Approximate Nearest Neighbor (ANN) indexing algorithms like HNSW (Hierarchical Navigable Small World) for sub-millisecond similarity search.

I’ll be exploring dedicated vector systems like Milvus and Qdrant (written in Rust for low latency), seeing how standard databases adapt via extensions like pgvector for PostgreSQL, and learning how orchestration frameworks like LangChain and LlamaIndex stitch pipelines and LLMs together.

Phase 4: Data Privacy, Masking & Governance for AI

When you “hover data above an LLM,” your security perimeter changes completely. Traditional Role-Based Access Control (RBAC) doesn’t gracefully translate when an LLM can access an entire data lakehouse and accidentally leak sensitive executive info or PII (Personally Identifiable Information) through a single user prompt.

This phase is about moving toward modern Data Security Posture Management (DSPM) and context-aware, adaptive policies. I need to understand and learn programmatic Dynamic Data Masking (redacting data at query time based on user role) and Row-Level Access Policies within the pipeline before the data ever reaches the LLM’s context window.

Phase 5: DataOps & Data Observability

Think of this as Prometheus and Grafana, but for data quality. If an infrastructure pipeline fails, a pod crashes. If a data pipeline fails silently, a null value enters a database, poisoning downstream AI models and corporate decision-making without anyone noticing until it’s too late.

Phase 6: Platform Distinctions & Architectures

Once the fundamentals are locked in, it’s time to get into the weeds of the platform layer. At the moment, I’m framing this exploration around three core pillars:

  1. How data enters
  2. How it is secured
  3. How it is consumed (including by AI)

At this stage, I don’t know what I don’t know, so this list may evolve, but the dominant enterprise platforms and ecosystems I keep hearing about are Databricks, Snowflake, and Microsoft Fabric.

image

TLDR;

  • Phase 1: Storage Formats (Iceberg, Delta Lake) — Where it lives.
  • Phase 2: Pipelines & Orchestration (dbt, Airflow) — How it moves and transforms.
  • Phase 3: Data Preparation for LLMs (Vectors & Embeddings) — How the AI reads it.
  • Phase 4: Governance & Privacy (DSPM & Masking) — How we keep it secure.
  • Phase 5: DataOps & Observability (Data Quality) — How we ensure it isn’t poison.
  • Phase 6: Enterprise Platforms (Databricks, Snowflake, Fabric) — The big suites that tie it all together.

My other “Big Rocks”

As well as navigating the brave new world of data engineering, I also need to continue diving deeper into a few of my existing technology areas:

  • DevOps Tooling & Source Code Repositories: Deepening expertise across GitHub, GitLab, and Azure DevOps.
  • DevOps, Product Management & Development Services: Focusing on the Atlassian stack (Jira & Confluence).
  • Platform Focus: Gaining an even stronger foothold in Red Hat OpenShift and KubeVirt.
  • Databases: Expanding operational knowledge of modern database architectures.
image 2
]]>
https://vzilla.co.uk/vzilla-blog/2026-thoughts-goals-focus/feed 0
Deploying the Veeam Software Appliance on Kubevirt https://vzilla.co.uk/vzilla-blog/deploying-the-veeam-software-appliance-on-kubevirt https://vzilla.co.uk/vzilla-blog/deploying-the-veeam-software-appliance-on-kubevirt#respond Tue, 23 Sep 2025 07:28:14 +0000 https://vzilla.co.uk/?p=3593 Virtual Machines on Kubernetes is a thing so I thought it would be a good idea to run through how to get the Veeam Software Appliance up and running on KubeVirt which will also translate to enterprise variants that use KubeVirt to enable virtualization on top of Kubernetes such as Red Hat OpenShift Virtualization and SUSE Harvester.

I wrote about the Veeam Software Appliance and ran through the steps to get the system up and running as the brains of your backup environment along with some of the benefits it brings.

For those familiar with the process I took in the above link in vSphere, be warned defining virtual machines in Kubernetes is all about the YAML.

What you will need

A Kubernetes cluster, I will be using Talos Linux as my Kubernetes distribution in the home lab but any bare metal Kubernetes cluster should work providing you have the system requirements.

The system requirements for the nodes in your cluster should have enough to meet the requirements of the Veeam Software Appliance. 8 vCPU, 16+ GB RAM, and two disks (≈240 GB+ each)

You should have a StorageClass with enough capacity to store the above disk requirements.

KubeVirt + Containerized Data Importer (CDI) installed and working. CDI is used to create DataVolumes and import/upload images. (I am running version 1.6.0 of kubevirt)

Client side you will need kubectl and virtctl, kubectl is how we will interact with the cluster, virtctl helps us to upload images, start virtual machines and open a VNC/Console to the machine.

High Level Steps

  • Upload the Veeam Software Appliance ISO into the cluster (DataVolume)
  • Create Blank DataVolumes (PVCs) that will become the appliance target disks (this needs to be 2 x 250gb due to us needing 248gb)
  • Create a Virtual Machine that attaches the ISO and the blank PVCs as disks (ensuring the correct boot order)
  • A service so that the system can be accessed outside of the cluster.
  • Start the VM and work through the initial configuration wizard of the appliance

Upload the ISO into the cluster

We now need to get the ISO into our cluster as a DataVolume to be used with the virtual machine creation in the next step. First we need to create a namespace where will create the virtual machine and everything associated with it.


kubectl create ns veeam
image 20

You could use port-forward but for a large ISO like this it might take a considerable amount of time. I opted to use a local web server on my system to share the ISO. I used a simple python web server with the following command.


python -m http.server 8080

We will then create a DataVolume for our ISO and we will specify the URL to get our ISO. Be sure to change your path and storageclass.


# veeam-iso-dv.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: veeam-iso-dv
namespace: veeam
spec:
pvc:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 13Gi
storageClassName: ceph-block
source:
http:
url: http://192.168.169.5:8080/VeeamSoftwareAppliance_13.0.0.4967_20250822.iso

We will then apply this with


kubectl apply -f veeam-iso-dv.yaml

The import process will then start and you can use the following command to see the import progress.


kubectl -n veeam get dv veeam-iso-dv -w
image 21

You can keep on eye on this, don’t panic if it gets to 99.99% and sits there for the same time again with no updates. You can come out of this watch command and check the pod logs, you will have an importer pod in your veeam namespace that you can run this command against.


kubectl logs importer-prime-a1ec931f-5335-4f59-aa97-6f165f1c38eb -n veeam

When complete you can run the following command and hopefully you will see something like the screenshot below


kubectl -n veeam get dv veeam-iso-dv
image 22

Note when I first tried to use port-forward with this iso it was suggesting 7+ hours so this way is much more efficient.

Create blank target disks (DataVolumes)

You might at this stage be realising the difference between Kubevirt and a hypervisor like vSphere…

We have the ISO uploaded to the cluster and we are now ready to create our Virtual Machine.

There is an option in the following YAML to use CloudInitNoCloud to inject an SSH user for access after installation, I have not added this as although you can SSH into the Veeam Software Appliance from a security standpoint it is not enabled and requires elevated security officer approval, maybe more on this in another post.

We are now going to define our Virtual Machine in code. Again if you are copying and pasting then please amend resources, storageclass and namespace if different to what I have used above.

The appliance requires UEFI vs BIOS so we have made that change to the VM configuration below.

You can find the VM code listed here in this GitHub repository
veeam-vm.yaml

https://github.com/MichaelCade/vsa_kubevirt

Now we can create our VM using the command below


kubectl apply -f veeam-vm.yaml
image 23

At this stage our virtual machine is powered off but to confirm the status we can check using


kubectl get virtualmachine -n veeam
image 24

Start the VM

Four YAML files later, we are now ready to start the machine.

When ready we can start things using the following virtctl command


virtctl start veeam-appliance -n veeam
image 25

lets then use that command again to check the status of our virtual machine


kubectl get virtualmachine -n veeam
image 26

Now we need a console to complete our initial configuration wizard within the TUI, I have installed tigervnc as my client

you can then run a virtctl command to connect


virtctl vnc veeam-appliance -n veeam
image 27

and with this command you will also see the following pop up,

image 28

Ok, I think that is enough for this post. I have been storing the code snippets in a repository on GitHub that can be found here. – https://github.com/MichaelCade/vsa_kubevirt

Once you get to this page of the Initial configuration, you can then follow this post to complete the steps. – https://vzilla.co.uk/vzilla-blog/the-veeam-software-appliance-the-linux-experience

]]>
https://vzilla.co.uk/vzilla-blog/deploying-the-veeam-software-appliance-on-kubevirt/feed 0
The Veeam Software Appliance – The Linux Experience https://vzilla.co.uk/vzilla-blog/the-veeam-software-appliance-the-linux-experience https://vzilla.co.uk/vzilla-blog/the-veeam-software-appliance-the-linux-experience#respond Mon, 08 Sep 2025 15:18:24 +0000 https://vzilla.co.uk/?p=3570 The first week in September 2025 saw a massive initial release of the Veeam Software Appliance. Since the inception of Veeam and the ability to protect Virtual Machines on VMware vSphere Veeam has been a Windows Server based product, until now.

I have also skipped over how “Virtualisation was just the start” and it was, now the Veeam Data Platform protects workloads and data across many different platforms, VMware vSphere, Microsoft Hyper-V, Proxmox, Oracle Linux Virtualisation and lots more hypervisors, as well as protecting public cloud workloads on AWS, Microsoft Azure and Google Cloud. The protection of Kubernetes came almost five years ago with the acquisition of Kasten now known as Veeam Kasten or Veeam Kasten for Kubernetes. We have M365, Salesforce backup, EntraID, Agents for Windows, Linux, Solaris, AIX and then we also protect unstructured data by way of NAS devices and Object Storage locations.

As of this week the management layer of Veeam is no longer just an option on Windows, Linux entered the room.

Home Lab Setup Plan

My morning of the 3rd September was sat waiting for the downloadable, public release of the Veeam Software Appliance (VSA) We as Veeam have made big splashes about this at our event earlier in the year and showed off a lot of the features and functionality, I am not going to get into too much of that here.

When the downloads became available it was time to start the engines, but not without a solid plan. I have documented my home lab setup in previous posts. As much as this is a home lab I still want to treat things with some real life accountability. However, what I am about to show you is that my repository will be living on the same virtual environment I am likely going to be protecting… for now and this will be resolved down the line with the Veeam Data Cloud Vault option where I can store those backups offsite in the Veeam first party solution.

OK, back to the plan.

image

As you can see from the above image we had some virtual machines to create, 10 if we include that DNS server down there. We also have some futures that are documented from Veeam as coming in the not too distant future.

The VSA is available in an OVA and ISO format. There is then a smaller ISO for the Veeam Components such as the Hardened Repository and Proxies. You can see we have 5 proxies planned and a Hardened Repository (again bad practice to store a backup repository on the infrastructure you are going to protect, don’t be like me)

For the VBR (Veeam Backup & Replication) server we will use the OVA and import this into our vSphere environment and create a new VM.

For the EM (Enterprise Manager) server we will use the ISO and create a VM and run through the configuration steps to get this setup.

The proxies and hardened repository will use the smaller ISO

Veeam ONE is a powerhouse with this release, still Windows but massively important. We will provision a Windows 2025 Server for this.

I will mention for this plan, I also provisioned a Windows 2025 Server Core for the DNS server and created a new forest called “vzilla.local”

Hopefully that makes everything in the picture a little clearer, the web client can be any OS and any mainstream browser (I have tried with Chrome, Edge and Firefox) The thick client aspect is used to access some of the Veeam ONE and the VBR server. I have installed the VBR thick client on the Veeam ONE Windows Server.

VBR Deployment Process

As mentioned I am going to be using vSphere here to deploy my virtual machines for this plan, clearly anything can be used including physical systems. Anything that supports the OVA but equally this is why there is an ISO for those physical systems or where OVA is not supported.

image 1

Select the OVA

image 2

I am not going to go through step by step here, basically select the OVA, give the machine a relevant name for your environment choose the DC, Cluster or Host you wish to deploy to, select some shared storage and finish.

When that import is complete, you will have a powered off virtual machine ready to be powered on. (I created another import to capture the process here and it is named 2025-veeam-vbr1)

Powering on the VM

image 3

Next we will see a booting ‘Veeam Backup & Replication” followed by the initial configuration wizard, starting with the license agreements.

image 4

Accept this then we can give the box a name.

image 5

Next we can configure our network, I have used the static option to set a standard static ip4 but ipv6 is also available to set here.

image 6

Then we need to change the time, I am based in the UK so I selected change and searched for London to update the timezone but left the available NTP servers as is.

image 7

Then we are onto setting a host administrator, this process is the same for any managed Veeam machine, proxies, repositories and enterprise manager.

image 8

The next screen is configuring the multi factor authentication (MFA), I am using the Microsoft Authenticator for all of my MFA needs so I hit show QR code, scan that within the app and then type the number provided from the app to proceed to the next step.

image 9

Following that step you are then asked if you would like to create a Security Officer,

At the Security Officer step of the Initial Configuration wizard, configure the default security officer account to perform specific operations in the Host Management console — veeamso. This account type provides an additional security layer to protect your infrastructure against malicious system administration.

The above was taken from the documentation pages – https://helpcenter.veeam.com/docs/vbr/em/deployment_linux_iso_install_security_officer.html?ver=13

image 10

if you choose to skip this step, you are prompted with a very red warning. It will not be possible to enable this role later on after this step. For the purposes of this demo walk through I have said OK but in my lab environment I have created that veeamso account on all hosts, when you set a password in this wizard, you will be required to change that when your security officer first logs into the management console to approve tasks.

image 11

The final page is a summary of what you have configured

image 12

Select finish and then wait for the state configuration to be saved and then for the services to come up. When they are up you will be able to access.

image 13

Host Management Console

The Host Management Console allows administrators to perform configuration and maintenance tasks, including managing network settings, server time, host users and roles, backup infrastructure, OS and Enterprise Manager updates, maintenance operations, and security settings.

This can be accessed via https://192.168.169.122:10443/

obviously changing to your set IP address, you will then be able to login here with the veeamadmin account.

image 14

Web UI Preview

The Veeam Backup & Replication web UI is a browser-based interface that enables you to manage backup and recovery operations, monitor your backup infrastructure, and configure system settings from any supported device. The web UI provides a modern, streamlined experience designed to simplify daily administration and deliver at-a-glance visibility into your data protection environment.

Please note that this is a preview and not all capabilities within Veeam Data Platform are available here today in this release, you will be able to manage some of your environment from here but you will need the thick client access for all tasks.

This can be accessed via https://192.168.169.122/

again changing your IP address above.

image 15

Thick Client Access

For those of you familiar with the thick client approach, it has been enhanced and looks much better, its faster and is used and designed to let you quickly find commands that you need and perform data protection and disaster recovery tasks.

As stated in my diagram above, you will need a Windows based machine to install and use the thick client. Point the client to your address above.

Next Steps

To complete the home lab setup, I went and created 6 further virtual machines using the smaller JeOS ISO and this is what was used to create the hardened Linux repository (bad practice to run this on a VM, the security score in threat center will also warn you against this, don’t be like me) and then the five proxies, the process is very much the same as the initial configuration wizard we went through above and then you add them into your VBR thick client.

With the proxies, I created one proxy per vSphere host, when I added them in you have the choice of what proxies they are going to be.

image 16

I then repeated the steps for VMware and VMware CDP as I want these to be the data movers for all tasks.

image 17


I had also deployed a Windows Server, downloaded the new Veeam ONE ISO and got things up and running there, maybe another post on those steps to come. Adding Veeam ONE is important as this gives you some great insight into the security posture of your backup environment. The threat center element is pulling data from Veeam ONE to display within the thick client and the web UI.

image 18

Next, I wanted to get my Enterprise Manager VM up and running and to do this I used the Veeam Software Appliance ISO and this runs again through a similar configuration wizard as shown but on first boot you will get a choice of Enterprise Manager or Veeam Backup and Replication to install on this, process was to create a VM with the required CPU, Memory and disks and then run through that process.

image 19

The final steps was to add some infrastructure to protect and then create some backup jobs. If you are familiar with Veeam this should be the same process as before.

To caveat once again here, this is a lab environment where I can show demos of Veeam Software, I am using virtualization for components that in production should be on physical hardware and not virtual machines, but for home lab environments what I have built will cover elements of what I need to cover in demonstrations. I will also be adding cloud based protection workloads and Veeam Kasten instances later on down the line that I have running to extend the lab into different platforms.

]]>
https://vzilla.co.uk/vzilla-blog/the-veeam-software-appliance-the-linux-experience/feed 0
Fixing a bounced vCenter server https://vzilla.co.uk/vzilla-blog/fixing-a-bounced-vcenter-server https://vzilla.co.uk/vzilla-blog/fixing-a-bounced-vcenter-server#respond Wed, 23 Jul 2025 09:02:16 +0000 https://vzilla.co.uk/?p=3557 A few times a year, my home lab gets bounced for whatever reason and generally because I have the vSphere vCenter living on top of the 5 ESXi nodes this causes an issue where when the hosts come back up we are lacking a fully functional vCenter, probably need to consider a better approach but here we are.

I am able to get into the vCenter Server Management console at https://192.168.169.181:5480 a port etched into my brain for some reason!

image

But when we head to services we have many that are not running and they should be.

image 1

I check the access settings and ensure that SSH is enabled so we can dive deeper and try and get these services up and running.

image 2

I first try ssh root@192.168.169.181 but I am met with

image 3

So my alternate way is using

ssh -o IdentitiesOnly=yes -o PreferredAuthentications=password -o PubkeyAuthentication=no root@192.168.169.181

Which gets me in to

image 4

I then use the service-control –status command to see the same what we have in the management UI

image 5

Followed by service-control –start –all which seems to sit here for a very long time but noting this down at this stage to make sure future me doesn’t do anything silly.

image 6

While the above was going on I checked the time tab in the management UI and we were back in time, around 2023 and wondered if this could be why things were not quite right and we were also using host based NTP, I changed this setting and got things up to date

image 7

In changing the time we were now able to get into

image 8

We do have to wait a while for things to finish initializing here. But maybe it was all down to time! I actually got impatient and hit the reboot button from the management UI.

Things do not look all that pretty and I may update if relevant here with what is going on!

image 9
]]>
https://vzilla.co.uk/vzilla-blog/fixing-a-bounced-vcenter-server/feed 0
The Hypervisor Hunger Games – Service Provider Edition https://vzilla.co.uk/vzilla-blog/the-hypervisor-hunger-games-service-provider-edition Sun, 29 Jun 2025 09:52:01 +0000 https://vzilla.co.uk/?p=3551 Many MSPs (Managed service providers) have hedged their platform offering in and around the vSphere ecosystem and now what?

I have said before about the cost conundrum here and these are some decisions that people in all worlds will have to consider. But in a service provider world it’s maybe not a simple rip and replace with Nutanix AHV or another.

Service providers bring values by having this stack that they not only bring a relationship with their customers they also can automate and provide additional wrap around services and join up this vast ecosystem we have when it comes to VMware.

It’s also very much a price per fight here for MSPs. Value add + capabilities so spending the winnings on software licensing probably doesn’t add up. Maybe platform replacements like Nutanix AHV or even Red Hat OpenShift are not that much different licensing cost wise compared to the Broadcom tax. (Maybe it’s a valid tax being the best hypervisor but also the strongest ecosystem)

What I do think we could see is a lot of service providers looking into KVM based options. Albeit the ecosystem is maybe not as polished and supported it might just be enough to ramp up.

I am talking about options like Proxmox, XCP-NG and maybe even the new hypervisor option from HPE but this will come as a premium as well. These options will also not be free, they will be free like a puppy but the cost will come from elsewhere.

The other option could be KubeVirt. KubeVirt is what underpins Red Hat OpenShift Virtualisation but it is an open source project that can be used across many Kubernetes distributions and managed with a bit more effort to OpenShift. Could this be a real option for service providers to accelerate their own offerings into the cloud native ecosystem? An ecosystem that has been built over the last 10+ years.

I am going to share a fantastic resource for the vSphere admin here from my good friend Dean Lewis

I want to be clear that KubeVirt is even though established and been around a while it’s still missing that polish that we have within mainstream vSphere, Hyper-V, Nutanix AHV hypervisors and platforms but I remember when vSphere was like this and we all flocked in that direction.

All I do know is that wherever service providers land the requirement for data protection and management will be there so regardless.

I wrote about protecting these VMs on Kubernetes here

Finally, one thing is for sure. Virtual Machines are not going anywhere! We might be in a world surrounded by AI but the trusty virtualisation era isn’t over and will continue to be a staple be it in the data centre. Or…. In the public cloud…. Could the public cloud IaaS be an option instead of on premises for providers?

]]>
Taking a look at KubeBuddy for Kubernetes https://vzilla.co.uk/vzilla-blog/taking-a-look-at-kubebuddy-for-kubernetes https://vzilla.co.uk/vzilla-blog/taking-a-look-at-kubebuddy-for-kubernetes#respond Thu, 15 May 2025 10:17:42 +0000 https://vzilla.co.uk/?p=3536 I have been meaning to get to this little project for a while, and here we are. You can find a link to the site below, I like this initial in your face message though, this tells me that this tool is going to tell me something about my Kubernetes cluster that I didn’t know, for the record I am going to download and run this on my home lab cluster and see what we get. This is not a production cluster!

So what is it…

KubeBuddy powered by KubeDeck helps you monitor, analyze, and report on your Kubernetes environments with ease. Whether you’re tracking cluster health, reviewing security configurations, or troubleshooting workloads, KubeBuddy provides structured insights.

image 21

Lets get started

Suspiciously this Kubernetes tool is built using PowerShell, I don’t think I can name another tool with this characteristic?

Luckily, PowerShell is now available cross platform, I am using a Mac so as part of this getting started we will also be getting PowerShell installed via brew.


brew install powershell

Other installation steps can be found in the usage section of the page link above. Ok, Good stuff we have our PowerShell installed and we can use

pwsh
from our Ghostty terminal to get into the shell. We can then run a command to get the KubeBuddy module installed.


Install-Module -Name KubeBuddy -Scope CurrentUser
image 22

Also from the above we can see the way in which we can start playing with KubeBuddy is by starting with the

Invoke-KubeBuddy
command.

We can then also use

Get-Help Invoke-KubeBuddy -Detailed
as a way to understand some additional flags we have access to here.

image 23

Are we ready to find out something we didn’t know about our cluster?

As you can see it was pretty easy to run against my Kubernetes cluster, I am running a Talos cluster, which is designed to be very minimum and extremely secure so there might be some things reported that are related this.

The Output

As you can see from the end of the video above, we have an output. For this output we chose html but you can get JSON and have seen in the report a save to pdf feature as well.

Here is the html output, I am not going to get into the issues its found, maybe that is a follow up but I think its great that we get a lot of detail without a lot of effort, the tool has taken away the having to search and find this.

Navigation along the top allows you to dive into each of those areas and display warnings and errors found in those specifics.

image 24

When we scroll down we see some more detail about the cluster, even for home lab 20 Critical seems like something we should investigate further.

image 25

Finally, on this initial page we see some information about resources and cluster events, not much going on in the lab right now or something not being picked up is my suspicion here.

image 26

As you then go across the tabs at the top you can get more granular detail on each area, all tabs have this similar layout, the initial Total of resources and those with issues then some recommendations and some findings. Again useful as to find this using

kubectl
would be a needle in a haystack.

image 27

My Thoughts?

This was a very quick overview of this little tool, I am intrigued by the PowerShell, I am intrigued by how this can be progressed and the future of the project and where it can go and highlight.

]]>
https://vzilla.co.uk/vzilla-blog/taking-a-look-at-kubebuddy-for-kubernetes/feed 0
My Thoughts on Retrieval-Augmented Generation (RAG) and the Power of Vector Databases https://vzilla.co.uk/vzilla-blog/my-thoughts-on-retrieval-augmented-generation-rag-and-the-power-of-vector-databases https://vzilla.co.uk/vzilla-blog/my-thoughts-on-retrieval-augmented-generation-rag-and-the-power-of-vector-databases#respond Tue, 13 May 2025 08:48:35 +0000 https://vzilla.co.uk/?p=3515 Some of you may have heard of RAG, retrieval augmented generation?

If you want to use an LLM to answer questions about data it wasn’t trained on, you can use the RAG pattern to supplement it with extra data.

image 17
Image Source = https://learnopencv.com/rag-with-llms/

But before we get into RAG, I wanted to touch on Vector Databases a little as they have become popular with the world of AI.

TLDR; A Vector Database is fantastic at cataloging how different pieces of data are related to each other.

What is a Vector?

Vectors are arrays of numbers and when those arrays represent something we call them embeddings. The term vector really just refers to the mathematical concept whereas embedding is kind of like an applied vector if you will. So what do these embeddings represent? Well, technically anything you want, but because it’s very common to use vector databases for natural language processing and semantic search.

Want to learn more about Vector Databases, take on this book! I have not braved it but in the content I have been reading and watching this is mentioned a lot.

image 18
Deep Learning: A Visual Approach by Andrew Glassner

Vector databases are just collections of embeddings and these are organised into indexes. An index is kind of like like a table, so a collection of rows of embeddings and we call those records.

RAG

Ok this then brings us back to one of the initial things we said:

If you want to use an LLM to answer questions about data it wasn’t trained on, you can use the RAG pattern to supplement it with extra data.

Let’s say you have a bunch of support docs.

These would get turned into embeddings and stored in a vector database. Then when the user types in a prompt, that prompt gets turned into embedding which is used to search the vector database for similar information.

What you’re doing here is a similarity search. Basically, you’re just looking for the nearest neighbour’s to the embedding that you give the database.

An example

Obviously, I wanted to get hands-on and start playing with some of this stuff in a world of AI but also as a Data Technologist I wanted to see what was possible with some of this data and see how it would handle being hovered above a powerful LLM.

Which then led me down a rabbit hole of how important do these Vector Databases become after your own data is embedded, how much CPU and GPU time and effort does this cost to re embed if something was to go wrong? Anyway that might be another post shortly.

Above we mentioned

Let’s say you have a bunch of support docs.

Now instead of docs lets pretend that we have an amazing community repository called 90DaysOfDevOps full of data and learning information. Kind of similar to support docs! We could probably ask an LLM about 90DaysOfDevOps and get some info back… but its going to be vast and wide and the LLM probably was not trained on this repository.

I am using Ollama with Mistral here… the other model will become clear later.

image 6

and if we then ask mistral a question about 90DaysOfDevOps what do we get?

image 7

For some this might be the way we have been interacting with LLMs so far, but what if we were able to take that personal data, or data that we want to specifically embed and use against or alongside (not sure terms) with an LLM. We can surely get a more rich response overall?

I have my dataset in the 90DaysOfDevOps repository, locally git cloned to my machine. I then have that mxbai-embed-large model you saw above and a trusty friend of mine in a Postgres Database instance running on a VM but could be anywhere and this has the Pg-Vector extension enabled for Knowledge storage. (Maybe another post, lets see how this one goes first)

image 8

I wrote a little app to deal with that embed process which is then in turn the same app which will allow me to interact with that RAG + LLM via a chat / API interface.

https://github.com/MichaelCade/vector-demo

Again maybe we need to go into more detail about this app another time, but for now. We have our Knowledge from our 90DaysOfDevOps repository. Each of these markdown files contains basically a blog about a topic related to the world of DevOps.

image 9

We have our Golang code to embed our data.

image 10

When the worlds align, and we run our binary against our data that has access to our likely hard coded postgres database instance…. we should start the embedding process into our vector database.

image 11

NOTE: if you made it this far and want to see how to spike your GPU… change the code to use mistral for the embedding process, a model that does not know how to embed or has not been trained on that like the embed model. Another rabbit hole I found that there are all sorts of models trained for different scenarios.

Here is what things look like within our super secure vector database, that we leaked connection info and all sorts via GitHub.

image 12

Using the same Golang binary we ran we can now interact with that API and chat with the vector plus mistral model.

image 13

I wanted to be sure that we were indeed getting something from the vector when we did this so added some additional code to tell me the chunks it was using to respond.

image 14

Now our whole app looks like the above embed part but also we added Backend API to the same code base. In the GitHub repository, linked above you will see a vector-demo-ui this is the React Frontend… no shame in saying I used vibe coding for this… who likes frontend stuff anyway.

image 15

and to top things off if you don’t want to interact with your AI chat assistant via curl then the frontend almost looks pretty…

image 16

Before we wrap things up, we should ask it something specific to the vector embeddings we have provided. First if we ask mistral directly about Day 49 of 90DaysOfDevOps we get:

image 20

Then with our RAG + LLM we get:

image 19

If you made it this far, I am impressed! We have seen a demise in blog views I think over the last few years so when I jot something down it is mostly for future me, looking for something I have done before, but hopefully this helps spur on someone else to unlock some of their data, and if useful, let me know… Also if you would like to see some content about protecting vector databases, or a deeper dive into the terrible coding I am doing with Golang let me know.

]]>
https://vzilla.co.uk/vzilla-blog/my-thoughts-on-retrieval-augmented-generation-rag-and-the-power-of-vector-databases/feed 0
My initial thoughts on using AI to manage Kubernetes Clusters – kubectl-ai https://vzilla.co.uk/vzilla-blog/my-initial-thoughts-on-using-ai-to-manage-kubernetes-clusters-kubectl-ai https://vzilla.co.uk/vzilla-blog/my-initial-thoughts-on-using-ai-to-manage-kubernetes-clusters-kubectl-ai#respond Mon, 12 May 2025 08:52:15 +0000 https://vzilla.co.uk/?p=3504 As with most Mondays, we start with a job and task in mind but quickly as we begin catching up on news from the weekend, we find some interesting rabbit holes to investigate. This Monday morning was no different but I also do not usually have the urge to share such information.

As you all know AI is everywhere, I mean if you do not have a chatbot can you even spell AI!?

My morning started with reading up on a tool called ‘kubectl-cli’ from Google – https://github.com/GoogleCloudPlatform/kubectl-ai

I had seen others doing similar things so was intrigued when Google come out with a project, to name one that I had on my list would be k8sgpt – https://k8sgpt.ai/

K8sGPT is for understanding and debugging what’s going wrong inside a Kubernetes cluster.

kubectl-ai is for interacting with the cluster more easily, translating your intent into commands.

The premise of these tools is the ability to use AI to manage your Kubernetes cluster and resources leveraging natural language. For me this does a few things, the barrier to entry in learning Kubernetes is the overwhelming CLI options and variations, albeit this is a superpower in itself its a challenge for many people that do not have that background. Kubernetes does have a complexity to it, its why it is so diversified in the fields we see it which means by the nature of it, it can do many things which brings complexity. My dad used to say to me “children should be seen but not heard” never really understood that saying but Kubernetes is the same… should be used but not seen… by most people… Maybe that works, who knows…

By adding the ability to query your cluster and instruct tasks via this and other tools we now dont need to memorise everything about kubectl and we can instruct it to run this and do that, or provide me feedback on this.

I started off trying to use the Google AI Studio API key but initially it said the model was overloaded and then the key seemed to be wrong.

image

So I then went and tried the ability to use a local model with Ollama but I only had my MacBook and you need to download the gemini pro model which is around 8GB and with no GPU I need to wait to do this with my desktop PC… maybe a video on this setup.

You can bring many models and services, so I used my trusty OpenAI key and got to work… exporting the key and asking some initial questions.

image 1

As I am focused an interested in the world of data management within Kubernetes I wanted to see how we could go about creating a backup policy and what I needed to provide to make this work.

image 2

Meanwhile up to this point, we were barely touching the spending on our OpenAI key…

image 3

As the $$ are low, we can ask a few more things about our backup policies

image 5

I then thought, what about getting some insight into our cluster, whats the health of things… maybe things I have not been able to see yet, I can just ask right and get a simple output of things I need to troubleshoot.

image 4

Very quick post to start with, but I am now intrigued into this simplification. Maybe I could release some of that RAM in my brain where I am storing all those kubectl commands and store something else.

As a beginner to Kubernetes you have the best chance to accelerate on here and get to grips with a lot more much faster… Just ask it to deploy your nginx deployment and expose it via a service… no longer do you have to worry about the YAML and kubectl commands.

My final thought, this is great for home labs and dev environments… Still be mindful running this on anything important… I also want to give K8sGPT a try as I can see this might do the same and some more things here.

I am sure there are many other tools popping up in this area, but as a quick comparison of the two I created this table.

FeatureK8sGPTkubectl-ai
PurposeDiagnoses and explains Kubernetes cluster issuesHelps write and understand
kubectl
commands using AI
FocusCluster health, error analysis, and troubleshootingCommand-line assistant for
kubectl
AI RoleUses AI to explain root causes and suggest fixesUses AI to translate natural language to
kubectl
commands
InstallationCLI tool + CRDs (optional for full diagnostics)
kubectl
plugin via Krew or direct install
IntegrationCan run inside clusters; supports multi-language outputWorks locally in the terminal as a plugin
Common Use CaseDebugging failed pods, misconfigurations, alertsHelping users construct or correct
kubectl
commands
]]>
https://vzilla.co.uk/vzilla-blog/my-initial-thoughts-on-using-ai-to-manage-kubernetes-clusters-kubectl-ai/feed 0
Visualising Veeam: Kubernetes Monitoring with Grafana and ArgoCD https://vzilla.co.uk/vzilla-blog/visualising-veeam-kubernetes-monitoring-with-grafana-and-argocd https://vzilla.co.uk/vzilla-blog/visualising-veeam-kubernetes-monitoring-with-grafana-and-argocd#respond Wed, 09 Apr 2025 11:45:55 +0000 https://vzilla.co.uk/?p=3491 I have been concentrating a lot this year on my home lab, in previous posts I have covered the set up but basically I have a 5 node Talos Kubernetes cluster with rook-ceph as my storage layer and I needed some monitoring for my home lab.

In a VM I am running Veeam Backup & Replication and I wanted to get some hands-on with Grafana, I have more plans but this was project #1

My good friend Jorge has been years into the Grafana dashboards for Veeam. You can find one of the dashboards here.

The Plan:

We are going to use our Kubernetes cluster to host our Grafana instance. Jorge has shared a script that we are going to repurpose into a cronjob, this job will run on a schedule. I think every 5 minutes. This will grab us some details via the Veeam Backup & Replication API and we will have some data visualisation inside of our grafana dashboard.

image

Deployment: Grafana & InfluxDB

We obviously need Grafana to show our Grafana Dashboard, we will also need InfluxDB which is where the cronjob will store our API data collected from Veeam Backup & Replication. There are many ways to deploy Grafana into your Kubernetes cluster, you could use helm (Kubernetes package manager) but I am going to be using ArgoCD.

I am storing my ArgoCD application here in this GitHub Repository.

image 1

This will get you up and running with Grafana. Next you need the IP to access your Grafana instance and the secret to go with the default user ‘admin’

image 2

Head over to a browser and get logged in and the first page here you can go and find some more stuff out about Grafana

image 3

Select Dashboards, you will notice that I have currently two configured, the one we are focused on is the “Grafana Dashboard for Veeam Backup & Replication” If you have not added this in your configuration you can manually add this as well using the New button in the top right.

image 4

and if you have been able to run the cronjob you will have something resembling your Veeam environment

image 5

Step Back

Ok all the above is great but I have not really helped you get there yet.

We have used ArgoCD to hopefully deploy Grafana and you will also see a application in there for InfluxDB so lets hope that we have those two up and running. But we need to put some more things in place.

First we will need an influx token and we can get this with the following command.


kubectl get secret -n monitoring influxdb-influxdb2-auth -o jsonpath="{.data.admin-password}" | base64 --decode; echo

Second we need a secret to enable our cronjob to hit our Veeam Backup & Replication server. Obviously add your details there.


kubectl create secret generic veeam-influxdb-sync-secret \<br>--namespace monitoring \<br>--from-literal=veeamUsername=administrator \<br>--from-literal=veeamPassword= \<br>--from-literal=veeamInfluxDBToken=

Then in the same GitHub Repository you will find a file called ‘veeam-influx-sync.yaml’ this is our cronjob configuration file so we need to apply this into our cluster as well but before we get to that we need to make sure we change some of the environment variables within this file as your environment might be different to mine.


          - name: veeamInfluxDBURL
            value: "http://influxdb-influxdb2.monitoring.svc.cluster.local"
          - name: veeamInfluxDBPort
            value: "80"
          - name: veeamInfluxDBBucket
            value: "veeam"
          - name: veeamInfluxDBOrg
            value: "influxdata"
          - name: veeamBackupServer
            value: "192.168.169.185"
          - name: veeamBackupPort
            value: "9419"
          - name: veeamAPIVersion
            value: "1.2-rev0"

Then deploy that into the cluster


kubectl apply -f veeam-influxdb-sync.yaml

This cronjob will run every 5 minutes but if you wanted to trigger it straight away we can use this command


kubectl create job --from=cronjob/veeam-influxdb-sync veeam-influxdb-sync-manual -n monitoring

You can then check the progress of this process using the following command


POD_NAME=$(kubectl get pods -n monitoring | grep '^veeam-influxdb-sync-manual-' | awk '{print $1}')
kubectl logs -f $POD_NAME -n monitoring

A big thank you to Jorge on this one, if it wasn’t for his hard work in this area then we would not have these dashboards! He has also created some amazing content around this and it is also not just Veeam dashboards, lots of great stuff.

Notes

On the final section of the cronjob script I have filtered to only show the VMware platform if you want to change this back then you can do so by changing the below code you will need to remove


?platformNameFilter=VMware"

veeamVBRURL="https://$veeamBackupServer:$veeamBackupPort/api/v1/backupObjects?platformNameFilter=VMware"
image 7

I am working on an update to see if this can be resolved and catch all objects without filtering.

Iteration

If you made it this far… you must be interested! I was not happy with the above situation where I could only display my VMware or one platform when I have several within my environment. I have iterated and now you will find an updated script that loops through the different platforms providing the data to influx and then in turn to Grafana.

Here is that script

And from there you can see that I have my MacOS backups, HyperV backups and Kasten backups all now showing

image 8
]]>
https://vzilla.co.uk/vzilla-blog/visualising-veeam-kubernetes-monitoring-with-grafana-and-argocd/feed 0