With the release of VMware vCenter Site Recovery Manager 6.1, VMware adds a new type of functionality to their disaster recovery software, support of stretched storage. SRM has been around for quite a while now, offering the ability to create a DR site for your VMware infrastructure. The data replication under the covers can be handled by VMware with vSphere Replication, or more typically the storage array using, for example, SRDF on the VMAX. Depending on the software doing the replication, those two arrays need not even be the same, e.g. RecoverPoint. SRM handles the orchestration/automation of the solution and when used in conjunction with storage replication software, communicates to the array through a Storage Replication Adapter (SRA). EMC storage has any number of SRAs for our various replication software solutions, though until now there was none for VPLEX. If you had VPLEX and wanted to use SRM, you would implement RecoverPoint on top of VPLEX and use its SRA. What is different about SRM 6.1, however, is that it supports stretched storage, a configuration inherent to VPLEX Metro. With this support EMC was then able to create an SRA for VPLEX.
So any one reading this who is familiar with SRM and VPLEX Metro might be scratching their head a bit trying to understand why SRM would support stretched storage which by definition is already available on both sites. It’s a fair question in my mind. After all VPLEX Metro supports the Metro Stretched Cluster (vMSC) and is generally presented as an HA solution. I wrote a whitepaper about such a configuration and integrated SRM at a third site with RecoverPoint (SRM RP WP). In a VPLEX Metro vMSC, all the hosts are in the same vCenter and the VMs can be easily vMotioned from one host to another, one site to another. In the event of a failure, rules can be configured which tell the VMs where to go. So why SRM with stretched storage? Well, the vMSC setup can be a bit complicated. The rules setup in particular can be tricky to get right. VMware looked at all this and spoke with customers, and the desire was there to have some better orchestration and automation – and what does that better than SRM. What SRM 6.1 offers, in conjunction with the VPLEX SRA, is the same capability it offers today but with stretched storage. It will take care of bringing up all the VMs on the other side, performing those customization (e.g. IP change) as required. In a DR rather than relying on rules, or user intervention (re-inventory), SRM will take care of it all. And because it is shared storage, it can do this much faster than traditional SRM implementations as the storage SRA does not need to perform the typical replication tasks, and datastores do not need to be resignatured and mounted. I think the other compelling feature for stretched storage is planned migration. In a VPLEX Metro this would allow customers to do maintenance on one side of the cluster after migrating the VMs over to the other cluster using SRM. Now of course this can be done today, but it is a manual or scripted process, not fully automated.
Well enough with the overview and explanations. I wanted to show you how the SRM 6.1 setup is different for stretched storage. I’m going to go through some of the important requirements and steps with screenshots and then include some demos. I am going to assume SRM knowledge on behalf of the readers as I will skip some of the setup.
- VPLEX SRA 18.104.22.168 VPLEX SRA binary VPLEX SRA Release Notes
- VPLEX Metro GeoSynchrony 5.4+ In my configuration I used a witness which is preferable, though not required. The VPLEX administrator’s guide explains how a witness works as well as all failover scenarios.
- VMware vCenter SRM 6.1 (VMware SRM 6.1)
- vCenter Server 6.x in Enhanced Linked Mode (explained below)
- Customized cluster site names are NOT supported. If your cluster is not currently named “cluster-1” and “cluster-2” you must rename them back to the default.
The setup for VPLEX Metro with SRM adheres to the SRM requirements in that there are 2 vCenters, not 1 as with vMSC. The difference comes with the aforementioned Enhanced Linked Mode (ELM – my abbreviation). ELM allows cross-vCenter migrations of virtual machines, and hence why it is required for the stretched storage SRM setup. The way ELM works is this: With vSphere 6 vCenter there are two components – a Platform Service Controller (PSC) and the vCenter itself. The PSC contains the SSO domain. Each vCenter in an SRM setup that supports stretched storage must share a PSC. This PSC could be external to each vCenter or included with one vCenter and then shared with the other vCenter (NB: In more recent vSphere versions embedded PSC is not supported). That is my setup. I installed the VCSA on the protected site with the PSC included. Then on the recovery site I just installed the vCenter component and pointed to the protected site PSC. Remember, even if you use the VCSA you still need a Windows x64 for SRM and the VPLEX SRA. I installed each of those components on their own server, pointing to the PSC and the appropriate vCenter with which to register. The VPLEX SRA does not require any additional software. Here is what the vCenters look like in ELM.
With the vCenters and SRM/VPLEX SRA setup, I then paired my sites. Here you can see the monitor tab where the SRAs are listed. Note that new column at the bottom that says “Stretched Storage”. If the SRA supports that, it will say, oddly enough, “Supported”.
The next step is going to be to add the array managers. This proceeds the same as any SRM configuration. After choosing the VPLEX SRA, I supply the information in the screenshot. As this VPLEX serves many purposes in the lab you can see I included the optional filter for the consistency group. In the following screen I added the other array and then enabled the storage.
Once complete, the enabled clusters appear along with the VPLEX distributed devices. Note how the local and remote devices have the same name.
I should mention that there are some rules about the device presentation you’ll find in the VPLEX SRA release notes. Most important I think is that the Detach Rule of the distributed consistency group must be setup to have the protected site (cluster-1 in my setup) as the winner. Note that in the event of a failover, the SRA will change the detach rule to the other site as it will now be considered the protection site.
So far, so good. The SRM setup up to this point should be familiar to anyone who has used this product. Our next logical step is to create the protection group and then recovery plan. Here, however, is where the paradigm changes when using stretched storage. VMware uses tag-based storage policies to setup the protection groups. If you have used the VASA 1 integration with VMAX or another platform you might remember creating policies based on disk type (for a reminder VASA WP) or perhaps you have used them in general for helping VMware users deciding what datastores to use for particular VMs (also critical with upcoming VVols). BTW VMware also uses them for Storage DRS (KB 2108196) in conjunction with SRM. For SRM these policies are tag-based so the first thing we need to do is to create a tag. In addition to the tag, we assign it a category which allows grouping of multiple tags. Once created, the tag is assigned to the VPLEX datastores. First we’ll create the tag and the category, and then the storage policy, and finally assign the storage policy to the VMs in the datastores. Here is a graphical step-by-step of tag/category creation. The image is not quite crystal clear but I think the steps are.
In enhanced linked mode, only create a single tag on the protected site and use it for both storage policies. As with other objects like networks and folders in SRM , there must be a mapping. Here is the new mapping tab:
The second part is to create the storage policy. Once again I have presented a step-by-step screenshot.
Because I named my policies differently on each site, I actually had to do a manual pairing:
Finally we need to assign the VMs in the VPLEX datastores to the new storage policy.
OK datastores tagged, mapping and storage policy created, and VMs assigned, now time for the protection group. I have the 4 step-process in the image below. Note in particular the new radio button in the second screen for the storage policies.
I’m not going to include the creation of the recovery plan since that has not changed. You still select the protection group from which to build the recovery plan. Let’s move on to an actual run of the recovery plan. There is an important thing to understand before you begin “testing” your VPLEX SRM environment and that is that VPLEX has no inherent ability to take a copy off a device, distributed or otherwise, for presentation to the remote site. This implementation is not like when RecoverPoint is integrated with VPLEX and SRM. If you try to run a test failover with SRM, it is going to fail. Therefore you have two options when you run a test: planned migration or failover. Because of the “finality” of failover, it is preferable to run a planned migration. The migration would then be followed by a reprotect, a second planned migration, and another reprotect to return the environment to its original state. If this were a typical SRM configuration I would agree with any statement that said “no test ability” sounds awful; but since this is stretched storage, we are never losing access to our environment. Of course that is not to negate the potential performance impact while the VMs are vMotioned to the other vCenter and the need to run the migration during either a slow time or maintenance window; but it is nothing like a typical SRM planned migration where an outage is expected. I won’t make you suffer through any more screenshots so here are the demos.
For the two demos, initially I have not included an audio track. The videos are quite short and I did not think any additional information was necessary beyond the callouts given the content of the blog. I will add audio in a few days though for those not directed from the blog post. The first demo is the planned migration from NY to NJ followed by a second demo walking through how to reverse the migration (reprotect, planned migration, and reprotect) and get us back to NY as the protected site.
Planned Migration (use HD 720p for best viewing)