Veeam Replication – Failover & Failback
We are into the home stretch on this series now, this post will cover the failover and failback of our replicated virtual machines.
In the previous posts, we setup our replication jobs and got our data going from one site to another, so this one will cover how easy it is to failover for a reason I will let you imagine, for the story though let’s use power outage but we only need to failover our SQL VM for a few hours whilst this host power issue is resolved.
For the purposes of this post lets expect that all the replication jobs have been successful, of course they are if you have been following. Oh, and we have some restore points in our secondary location.
The screen above shows the ready replicas that we have powered off in our secondary sites. Notice in the above though that I have two replica locations and in some instances, I have our Domain Controllers and SQL boxes going to both locations. This could achieve a multi-site failover plan if required. Or also it could be that one site is a development site on the same campus as live and these replicas can be used for a SureReplica job (VMware only) This will be covered in a forthcoming post.
When you right click on the VMs you have some options.
The ones we are interested in are the top three.
Failover Now – This is the panic stations button, disaster scenario and we need everything over and up and running as soon as possible, we will cover this shortly.
Planned Failover – A little less panic station, maybe a power situation on the host and we only need a subset of machines to failover to a secondary but the difference being here is we have the planned time for the maintenance. We will get to this later as well.
Add to Failover Plan – A failover plan allows you to group a number of virtual machines and set some basic configuration on how they get started up in that second location, in this scenario we are only going to failover one VM but let’s walk through the process as it would still work with 1 VM.
Failover Plan
When you select through the wizard above you get the simple steps to create a new failover plan. On the first screen, you can give a name and description and you are able to add pre-and post-failover scripts. Script files formats allowed: BAT, CMD, EXE and PS1. An example here could be that you have a tertiary application that is linked to this particular VM that is not part of the failover so we may just want to trigger a PowerShell script to stop a certain service on the application server.
The following screen is where you can choose the VMs you require to be part of this failover plan. You have the ability to set a delay this is to delay the VM boot time as part of the failover trigger, on this same screen you are able to move the VMs up and down according to the order in which you need them to boot.
That’s it once you complete that step its then time to see the summary of the configuration for the failover plan, simple stuff here as we only have one VM. I do quite like the command function it gives you in this last summary screen. Meaning you could in theory run this from another location that can talk to the VBR server.
In the console, you will then see the newly created failover plan appear
I am not sure how many people are actually aware of this fact but if you then navigate to the failover plan you have two options to choose from, you can start and that’s going to kick off a failover plan, starting the VMs at the secondary site. And then you have Start to, this allows you to schedule the most recent replicated restore point to be powered on. One thing to note here is this will not shut down the primary VM. If you select the undo option here when the replicas are running this allows you undo without reverting any changes to the primary VM.
Planned Failover
Planned Failover is the ability to deal with scheduled disruptive maintenance. Going back to that replica screen and right clicking on the VM you have this option. You can choose further VMs here if required.
We can then add our audit reason for performing this.
Finally, the summary.
The next stage in this option is a further incremental from production to secondary will take place. The great thing about this option is that the planned failover job will shut down the production system as well. It will then perform another update to the secondary site.
At this point we are now running our SQL workload on the secondary site. When I first open up the SQL management studio you can see I only have one very important DB.
The power outage has taken a little longer than expected so much so we had to create another database for a new application. We may have rushed the naming convention.
Failback
We have been told it won’t take long now for the power company to switch things back on at our main data centre. This is the view from the Veeam Backup & Replication console.
The options that you get when you right click on that VM are shown below, this gives us some options to consider, if we thought performance is much better now it’s over in our secondary location then we could choose permanent failover, or we could use undo failover if we just want to bring things up in the primary forgetting all actions performed on the secondary.
We got the phone call to say that things were all back online from a power perspective so we tested connectivity and it was good to failback with all of our changes. We selected failback to production and a new wizard appears.
We have a few options as part of the failback wizard. Be sure to check out the options here. Quick Rollback is a great feature that is not mentioned enough. I will try and write up something on that another time.
Last page although a summary it’s important to note that this is where you can trigger the power on of the VM.
Steps taken on failback.
and finally showing the failed back instance with the newly created database still in place.
Finally we have to tell Veeam that we are ready to commit this failback.
It’s worth noting before closing out that the failover now option, assumes that the production VM is already powered off and we are getting straight into powering on the secondary VM, no incremental replication jobs happen. Recovery options are the same as we went through above.