We have finally released our long awaited VASA 3 implementation for vVols 2.0 which includes support for SRDF Asynchronous (SRDF/A) replication. Lack of replication support has been a blocker for many customers to even look at vVols since a DR solution is such an essential component of a business (as it should be). And having achieved this release, I hate to add another impediment, but I am afraid the initial prerequisite for vVols 2.0 on PowerMax has a hurdle. The prerequisite is that we can only support VASA 3 on a newly ordered PowerMax array, and if you want replication of course that means ordering two of them with the new PowerMaxOS Q3 2020 SR code. The reason for this is that with VASA 3 we have an embedded VASA Provider or EVASA. EVASA runs as a container on the array just like Unisphere or eNAS. It therefore requires memory and CPU and must be sized properly at the outset of building an array. We are working on being able to hot-add the EVASA container/GuestOS early next year. We still have the external VASA Provider (vApp) but it only supports vVols 1 and VASA 2, so no replication or SRM. Of course some of you will have new arrays on-site or coming with EVASA so let’s discuss what’s new.
Embedded VASA Providers (EVASA)
As the array comes (when ordered or post-install with HickorySR) with embedded VASA already installed and configured, there is nothing to configure or deploy for VASA 3. You get to skip the whole vApp thing, and move to directly register your EVASAs in the vCenter with the IP addresses you supplied to Dell EMC. Note that because it is a container on the array, there are two of them like the other features such as eNAS. Therefore register both providers in the vCenter for high availability. Between the vCenter and the array, a determination is made as to which EVASA is Active and which is Standby. Below is my environment for this blog.
SRDF/A
Dell EMC supports a single mode of 2-site replication with vVols, SRDF/Asynchronous. SRDF/Asynchronous (SRDF/A) mode provides a dependent write consistent copy on the target (R2) device, which is only slightly behind the source (R1) device, though for vVols the RPO will be set at a maximum of 300 seconds. SRDF/A session data is transferred to the remote PowerMax array in predefined timed cycles or delta sets, which minimizes the redundancy of same track changes being transferred over the link. SRDF/A provides a long-distance replication solution with minimal impact on performance that preserves data consistency.
Unisphere for PowerMax
There is a pre-step to replicating vVols with SRM, the same as if you were replicating VMFS. Remember that SRM is simply the orchestration of failover, you can failover manually without it. Now unlike VMFS, you cannot use Unisphere for PowerMax (U4P) or Solutions Enabler (SE) to failover. We do not allow you to control vVols that way. If you absolutely don’t want to use SRM, you can use PowerCLI instead. We don’t recommend it as there can be issues as it is easy to get yourself into a bad environment state (SRM has built-in guardrails). To setup replication, it is easiest to use U4P (SE can do it, too). So this is an additional task for the storage administrator, on top of creating PEs and storage containers and resources. The steps are simple enough. A VASA Replication Group (VRG) is created which is an SRDF/A Group with a special designation as VASA_Async type. The group is created by associating a storage container on one array to a storage container on another array. Once the relationship exists, the RDFG will be available for selection in vCenter (this is demonstrated in a GIF below for VM creation). Here is a demo of the creation.
Now we can proceed with the SRM configuration and usage.
vCenter and SRM
I’ve taken the more common tasks related to SRM integration with vVols and created GIFs of them as none of them are very long. I’ll take you through the required tasks. First though let’s discuss compliancy.
Compliancy
Before showing the SRM setup, you need to understand that when using vVols with SRM, the idea of VM compliancy plays a key role. Compliancy is used when storage policies are assigned to a VM. Under normal circumstances when using a storage policy, VMs are compliant immediately upon creation; however when assigning a VM to a storage policy that includes replication, the VM will initially show as Noncompliant.
This is informing the user that the VM does not meet the conditions set forth in the storage policy, namely that the VM is replicated. A VM will remain Noncompliant until the devices (vVols) making up that VM are fully synched with the remote devices, i.e. that the pair(s) are in a consistent state (SRDF/A). Use the CHECK COMPLIANCE link in the bottom of that widget to have VMware re-evaluate compliance. When the VM is Compliant there will be a green check mark.
SRM Setup
When using vVols with SRM, the setup of SRM does not change, though you must use vSphere 7 as the vCenter. We don’t support 6.7 with replication. Install SRM 8.3+ (only version vVol supports) as you normally would. If you’ve used SRM with a VMAX or PowerMax before, the only difference here is that you do not require a Storage Replication Adapter. vVols integration with SRM uses the VASA Provider to conduct all storage activity. You may be familiar with the SRDF SRA, but unless you are also replicating VMFS and RDMs, you don’t have to install that (yes, you still need it in those cases as VASA doesn’t work with VMFS/RDMs).
Storage Policies
The first thing we need is a storage policy. A storage policy allows us to assign the array capabilities that VASA 3 reports to the vCenter from the PowerMax. In this example I use the Diamond service level and assign the replication attribute. Note that the only change I could make is the fault domain as I have multiple arrays in my lab.
Create VM with a Storage Policy
Deploy a new VM, using the storage policy. This will ensure that when the VM is created on the array, the vVols will be placed in a storage group with a Diamond SL and then replication setup on the backend.
So at this point I should note that you have a fully replicated VM as it is compliant (pairs are consistent). The R2 devices are available in the remote container (specified in the VRG) and you could use PowerCLI to run tests or failover the VM without SRM. Remember though, there is no manipulation of replication outside of PowerCLI or SRM. You can’t use U4P or SE to run commands on the devices. So onto SRM.
Create Protection Group and Recovery Plan
Next we can create the protection group and recovery plan. The VM need not be in compliance before we create these SRM objects though I recommend it.
SRM Testfailover
Let’s run a testfailover of the single VM. Before running the test we do want to make sure it is in compliance. A vVol testfailover is just like one for VMFS. We create a snapshot of the R2 device(s) and then create linked targets which are presented to the recovery site. Therefore the production pair(s) are not impacted. The creation of the snapshot, however, is not done at the time of the test as it is for VMFS; rather we keep 5 snapshots of the devices in an VRG at all times, 5 minutes apart. When a test is run, we take the most recent snapshot which will be no more than 300 seconds from the current. In the GIF notice that the VM in the test state is noncompliant. This is expected.
I didn’t include the Cleanup as there isn’t much to it. Let’s assume it was done off-screen prior to the next step.
SRM Planned Migration and Reprotect
As our test was successful, we can run a failover (planned migration) and then a reprotect. A vVol failover is unlike a traditional VMFS failover. Instead, when we failover we use the same process as a testfailover. We use the most recent snapshot and linked targets. The RPO will be always be 300 seconds or less (the RPO could be quite small if the snapshot was just taken). I have a few callouts in this one just for clarity, and it is a video as the GIF was just too big and WordPress didn’t seem to like that.
I want to emphasize what the video showed concerning reprotect with vVols. For those of you familiar with SRDF failover/swap, vVols do not work this way. In a traditional VMFS environment with the SRDF SRA, when we run a reprotect we simply swap the device pair personalities so that the R2 becomes the R1, the R1 the R2. This means we have far fewer tracks to synchronize before you can run another failover/reprotect to return the environment to its original state. Generally, reprotect operations with VMFS can be quite quick. With vVols, however, we create new pairs and delete the old ones because we are working off a linked target (as explained above), not the R2 device. Hence, a reprotect is a full synchronization. This can be time consuming depending on the device size. Yes, there are technical reasons we had to do it this way, no I won’t (or probably can’t very well) explain them other than to say it had to do with the VASA specifications.
Complete video
I also have a more complete video, tying the steps together.
One final topic.
Soft Limits
For the first release of VASA 3, we are putting up some guardrails to ensure our customers receive the performance experience they expect. I’ve designated these as “soft limits” because the code will not prevent you from exceeding them. We strongly recommend remaining within these boundaries:
- 250 VMs supported with SRM
- 2000 vVols – average of 8 vVols per VM; however, the 2000 vVols may allocated as required across VMs
- 25 VASA Replication Groups
VMware has a good number of “hard limits” that you can find in their documentation. Be sure to review those before implementing vVols with SRM.
Wrap-up
Time for the wrap-up show. Let’s do it in the form of questions as I’ve had some common ones that are worth answering.
- Does Dell EMC replicate the VM snapshots that are part of a VM?
- No. Any snapshots that are part of a VM are lost at the failover site. This is how our SRDF technology works.
- When will you support other replication modes like SRDF/Metro?
- Obviously roadmaps are NDA; however I will say that Metro is the most desired option by customers but it requires a collaboration with VMware so it will take some time. Note that this is the same for any vendor that offers active/active.
- When will the soft limits be lifted?
- That will depend both on customer requirements and future enhancements. VMware is limiting total VMs to 500, so I suspect we would start by getting there.
- How about support for NVMeoF?
- Again, another that requires extensive work by VMware and storage vendors.
- What is our current plugin support for vVols, particularly with replication?
- Virtual Storage Integrator (VSI) does not support vVols with or without replication. They will not cause a problem with functionality, however.
- Dell EMC Enterprise Storage Analytics (ESA) supports non-replicated vVols, and now in ESA 6 uses the inherent VMware vVol support. But there is no support for replicated vVols and in fact if you have a VASA Replication Group, ESA will be unable to collect objects or metrics for the entire array. If you have ESA and will use vVols 2.0, please open an SR with support. This will start a process to get you a hot fix for ESA as vVols 2.0 support will not be officially supported until ESA 6.1, but certainly we don’t want the feature to prevent you from collecting normal objects.
- Documentation?
-
-
- Well I’m glad you asked. I’ve removed the old vVol documentation and created 2 new documents. One covers the external VASA Provider which supports vVols 1 and VASA 2, and one for EVASA which covers vVols 2 and VASA 3. You can find both docs at the Documentation Library page. The content in this blog posts just scratches the surface so be sure to check it out.
-
-
If you have other questions, please leave them in the comments and I will add them to the list here for everyone’s benefit.
Awesome content Drew. Very instructive. Thanks
Thanks!
Hello, when will EMC PowerMAX or other storage support synchronous replication with vvol?
Hi Lei,
The replication mode available for vVols is driven by VMware and currently they only allow asynchronous for any storage vendor. They are in the midst of developing the specification for active/active replication as that mode is most requested by customers after asynchronous. When the spec is complete we will be able to support it along with other vendors that have active/active technology. I would guess that is more than a year away. There has been little interest in synchronous so it is unlikely that will be supported in the near future, however I would suggest pushing it through your VMware representative if that mode is important to your business and you intend on running vVols.