You may or may not be shocked to hear that I have a server in my garage, I reduced my home lab foot print a few months back and was left with just one HP ML110 Gen7 with some capacity disks. I decided to put Windows 2019 on the box and use this as my home Veeam server, on here we have Veeam Backup & Replication looking after my agent backups and NAS share backups. We have Veeam Backup for Office 365 looking after my vZilla Office 365 account.
I also enabled Hyper-V on the box! Madness, I hear you cry! Well it’s a good place to stick a few test machines alongside and I can use Veeam Backup & Replication to protect any of these test beds. I have a v11 beta build in there and I also have a 3 node Kubernetes cluster running there for fun and education.
The setup is straight forward, one server, a few disks with the above software and configuration. The disks are added and formatted with ReFS and added to Veeam Backup & Replication as backup repositories.
Finding the issue
As you can see from the above Backup Repository 1 is in maintenance mode. This is where our issues started to happen in the home lab. The reason for the post is to highlight some of the advancements that happened in v10 behind the scenes really to ensure if you have issues with disks in your backup repository configuration then you will always have an easy button to click. First of all I started getting this error on one of my backups.
After I did some research on this I found this knowledge base article explaining the issue I was seeing. After following the solution here with deleting the problem backups and getting repeated errors I looked deeper into the storage layer. This is where I found bad sectors on the drive which were clearly causing the issues here for the backup process and particular this was a transform operation which to put it in a short form, the full backup file that was taken on day one is merged with the first incremental backup and this relates to the forever forward incremental backup method used. More on that can be found here on actually the writing exercise I was set to join the Product Strategy group here at Veeam.
Once I had accepted that this 2TB drive was probably on its last legs we had to do something about this to make sure all of our home backups were in good shape as fast as possible. The first thing was to put the Backup Repository into maintenance mode.
Depending on how serious the failure is and if you can access the data then there is the option to evacuate data from the backup repository to a healthier extent. If the drive is not in a good place then this obviously may not be possible. Think of the process that I am seeing as a pre-empt to maybe a larger drive or repository problem later down the line. My drive although failing is not completely unreadable and with 1.3TB of backup data already on the drive I want and need those restore points. This evacuation is going to take some time! You will need to disable your jobs and let this run. More details on the steps taken can be found here.
Once the evacuation is complete and you may see further errors here depending on how badly affected the drive is, if any data is moveable. (don’t worry I also have another copy of all this data in object storage) At this point you should be able to re enable your jobs and then the new backup locations will be picked up and normal backup chain operations can continue. I was not so lucky on some of my backup jobs and the copy or the evacuation process was unable to read sectors of the disk.
The best route of action now was to preserve the restore points on the failing performance extent and start a new backup chain, this is where Seal Mode comes in. Seal mode enables you to remove data located on these extents by applying a retention policy. Also used if you want to be able to age out old storage from your scale out backup repository or a failing sub set of drives from a system like this situation.
Where required I ran full new backups, to start a new chain where we could not evacuate existing chains. Things were starting to look better in the home lab.
At this point nothing is relying on Backup Repository 1, but we have our last 7-day retention there for some of our backup chains. But we need to ensure no new data is written to that extent hence why we left it in maintenance mode. We want to seal off this extent and leave it for the retention of the backup before we can then remove from our Scale Out Backup Repository.
If you have got this far then you probably are having a similar issue….. ok I clearly didn’t read the manual and went ahead and created and kicked off the active full backup for each of the job that I could not evacuate. When you select seal extent you get the following dialogue.
This process when you click yes above, doesn’t take long.
Now anything that was relying on that extent for the backup chain either because you manually kicked off an active full with the extent in maintenance mode or because you hit yes above the next backup job will perform an active full backup. All backups from the failing extent will be moved to imported. Final job as well is removing the failing extent from maintenance mode this means you will be able to recover data from that extent. You will also see the lock on the backup repository now to indicate it is sealed.
Those backup jobs will now land on different extents and everything will be back to smooth sailing, obviously after the retention period is reached on the failing drive you can remove this now from the scale out backup repository and perform additional testing or in my case replace with a new drive. If you wish to restore from any of those backups stored on that extent then you will find them here on the home screen.
Probably not as dramatic as if this was not just a home lab server and protecting some home desktops and laptops, but this is an everyday occurrence in many IT environments. This flexibility and simplicity of being able to move data freely between storage is a massive deal.