Fault Tolerance (FT)
VMware Fault Tolerance is a new feature, at the heart of FT is the record/play feature, which was a programmers debugging tool, with record/play you can capture all the virtual interrupts that take place inside a VM. This means in the future you will be able to redirect this recording process to another VM on a different ESXi server in real time. This means that two ESXi could have the same events that are replayed and both servers will be in a synchronous state. This feature is know as lockstep technology and is an attribute of modern CPU's. VMware is working in conjunction with Intel and AMD to offer support for this feature, which is known to them as vLockstep.
Fault Tolerance has some advantages and disadvantages
Advantages |
|
Disadvantages |
|
Bear in mind this is new technology and I will presume that as it matures many of the disadvantages will be addressed, you can work around some of the disadvantages by using affinity rules to prevent specific multinodes systems residing on the same ESXi server.
There are a number of requirements that you need to enable FT
CPU compatibility is the most challenging aspect to getting FT working, currently there is limited support, but as new CPU's hit the market these will support the lockstep feature. Check the VMware to see if your CPU is supported, I generally try to enable FT and see if I get any error messages. Follow below to enable and configure FT
Enabling FT | First you have to confirm the certificate management has been setup, this enhances security by making sure the ESXi server is not spoofed, if ESXi servers are added to vCenter with just a username and password without this certificate check, VMware FT will not start correctly. From the home page -> administration > "vCenter server settings" you get the screen below, make sure "vCenter requires verified host SSL certificates" is ticked Make sure both VMotion and HA are working, then you need to enable a FT logging VMKernel port group, all ESXi server will require an additional IP address for this port group, make sure when creating the port group you select "Use this port group for Fault Tolerance logging"
Hopefully you should end up with something like below Check that the VM's disk types are thick, you can do this by selecting the VM -> select "Edit settings" -> then select each disk and check the Disk Provisioning type, you can see in the screen shot below that this virtual disk is type thick. You can convert thin disks into thick to make them compatible with FT.
Finally we can enable FT on a VM, right-click on the VM -> select Fault Tolerance -> select "turn on Fault Tolerance" You will see the below warning message, regarding disk provisioning and other information Here I get two warnings, one regarding that this VM has two vCPU's, remember you can only have one vCPU, the other warning is that my hardware (HP DC7800) is not compatible, however we will continue I can double check if the hardware is compatible by selecting the ESXi server, then in the general panel you should see "Host Configured for FT" and a small speech bubble at the end, click the speech bubble and you get the "Fault Tolerance requirement error messages", as you can see my HP DC7800's are not compatible
After I remove one vCPU from the VM I tried again, you can watch the progress from the "recent tasks" window If you select the cluster and then the "virtual machines" tab, you will notice that there are two linux01 VM's, the primary and the secondary Looking at each ESXi server in the "Fault Tolerance" panel you can see which one is the primary and the secondary
Lastly you can either migrate, disable the fault tolerance or turn off the fault tolerance for this VM If I had got this working you could have started the VM on both the primary and the secondary, the secondary VM you would not be able to interact with. |
If I get the chance to setup a FT on compatible hardware I will revisit this section.